reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Random Policy Evaluation Uncovers Policies of Generative Flow Networks

Authors: Haoran He, Emmanuel Bengio, Qingpeng Cai, Ling Pan

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results across extensive benchmarks demonstrate that RPE achieves competitive results compared to previous approaches, shedding light on the previously overlooked connection between (non Max Ent) RL and GFlow Nets.
Researcher Affiliation	Collaboration	1Hong Kong University of Science and Technology 2Valence Labs 3Kuaishou Technology. Correspondence to: Ling Pan <EMAIL>.
Pseudocode	Yes	Algorithm 1 Policy Evaluation Algorithm 2 Flow Iteration Algorithm 3 Rectified Policy Evaluation
Open Source Code	No	The paper states: "We implement all baselines based on open-source codes from Kim et al. (2023)2 and Tiapkin et al. (2024)3." Footnotes 2 and 3 provide GitHub links to these baselines. However, the paper does not explicitly state that the code for their proposed method (RPE) is released or provide a link to their own implementation.
Open Datasets	Yes	We compare RPE against GFlow Nets and Max Ent RL baselines across several GFlow Nets tasks, including TFBind generation (Shen et al., 2023), RNA design (Kim et al., 2023), and molecule generation (Shen et al., 2023). We consider four distinct target transcriptions employing the Vienna RNA package (Lorenz et al., 2011) as studied in Pan et al. (2024) We study the variant of the QM9 molecule task as studied in prior GFlow Nets work (Jain et al., 2023b; Shen et al., 2023; Kim et al., 2023)
Dataset Splits	No	The paper references prior works for the experimental setup for different tasks (e.g., "For the tree-structured TF Bind task, we follow the experimental setup described in Jain et al. (2022)"), but it does not explicitly describe dataset splits (e.g., percentages, sample counts, or specific split files) within its own text.
Hardware Specification	Yes	We run all the experiments in this paper with RTX 3090 GPU.
Software Dependencies	No	The paper mentions using an MLP network with ReLU activation and the Adam optimizer, and refers to external open-source codebases for baselines. However, it does not specify software components (e.g., Python, PyTorch, TensorFlow) with version numbers for its own implementation.
Experiment Setup	Yes	We use an MLP network that consists of 2 hidden layers with 2048 hidden units and ReLU activation (Xu et al., 2015) to estimate flow function Fθ. We clip gradient norms to a maximum of 10.0 to prevent unstable gradient updates. We train our model for 1e4 steps, using the Adam optimizer (Kingma & Ba, 2014) with a 3e 3 learning rate. We set the reward threshold as 0.8 and the distance threshold as 3 to compute the number of modes discovered during training.