Random Policy Evaluation Uncovers Policies of Generative Flow Networks

Authors: Haoran He, Emmanuel Bengio, Qingpeng Cai, Ling Pan

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results across extensive benchmarks demonstrate that RPE achieves competitive results compared to previous approaches, shedding light on the previously overlooked connection between (non Max Ent) RL and GFlow Nets.
Researcher Affiliation Collaboration 1Hong Kong University of Science and Technology 2Valence Labs 3Kuaishou Technology. Correspondence to: Ling Pan <EMAIL>.
Pseudocode Yes Algorithm 1 Policy Evaluation Algorithm 2 Flow Iteration Algorithm 3 Rectified Policy Evaluation
Open Source Code No The paper states: "We implement all baselines based on open-source codes from Kim et al. (2023)2 and Tiapkin et al. (2024)3." Footnotes 2 and 3 provide GitHub links to these baselines. However, the paper does not explicitly state that the code for their proposed method (RPE) is released or provide a link to their own implementation.
Open Datasets Yes We compare RPE against GFlow Nets and Max Ent RL baselines across several GFlow Nets tasks, including TFBind generation (Shen et al., 2023), RNA design (Kim et al., 2023), and molecule generation (Shen et al., 2023). We consider four distinct target transcriptions employing the Vienna RNA package (Lorenz et al., 2011) as studied in Pan et al. (2024) We study the variant of the QM9 molecule task as studied in prior GFlow Nets work (Jain et al., 2023b; Shen et al., 2023; Kim et al., 2023)
Dataset Splits No The paper references prior works for the experimental setup for different tasks (e.g., "For the tree-structured TF Bind task, we follow the experimental setup described in Jain et al. (2022)"), but it does not explicitly describe dataset splits (e.g., percentages, sample counts, or specific split files) within its own text.
Hardware Specification Yes We run all the experiments in this paper with RTX 3090 GPU.
Software Dependencies No The paper mentions using an MLP network with ReLU activation and the Adam optimizer, and refers to external open-source codebases for baselines. However, it does not specify software components (e.g., Python, PyTorch, TensorFlow) with version numbers for its own implementation.
Experiment Setup Yes We use an MLP network that consists of 2 hidden layers with 2048 hidden units and ReLU activation (Xu et al., 2015) to estimate flow function Fθ. We clip gradient norms to a maximum of 10.0 to prevent unstable gradient updates. We train our model for 1e4 steps, using the Adam optimizer (Kingma & Ba, 2014) with a 3e 3 learning rate. We set the reward threshold as 0.8 and the distance threshold as 3 to compute the number of modes discovered during training.