reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

When do GFlowNets learn the right distribution?

Authors: Tiago Silva, Rodrigo Alves, Eliezer de Souza da Silva, Amauri Souza, Vikas Garg, Samuel Kaski, Diego Mesquita

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Importantly, all sections of this work provide experiments to substantiate our theoretical analyses, illustrating the claims and demonstrating the practical relevance of the methodological contributions.
Researcher Affiliation	Collaboration	Tiago da Silva Rodrigo Barreto Alves Eliezer de Souza da Silva Getulio Vargas Foundation EMAIL Amauri Souza Federal Institute of Ceará EMAIL Vikas Garg Yai Yai Ltd and Aalto University EMAIL Samuel Kaski Aalto University, Manchester University EMAIL Diego Mesquita Getulio Vargas Foundation EMAIL
Pseudocode	No	The paper describes methods and equations in text format, but there are no explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	To avoid implementation bias, we reproduced our experiments using Pan et al. (2023a) s publicly released code1 and obtained similar results. 1Available online at github.com/ling-pan/FL-GFN.
Open Datasets	Yes	We reproduce the experiments in Figure 8 for the task of sampling DNA sequences of length 10 in proportion to a reward function defined by wet-lab measurements of the sequence s binding affinity to a yeast transcription factor (PHO4) (Shen et al., 2023; Jain et al., 2022; Barrera et al., 2016; Trabucco et al., 2022).
Dataset Splits	No	The paper focuses on generative tasks and sampling from distributions, rather than using predefined datasets with explicit train/test/validation splits. For example, for 'Set generation', it states: 'The support X is defined as the collection of sets with 16 elements sampled from a deposit D = {1, . . . , 32}'. No specific dataset splits are mentioned for any of the tasks.
Hardware Specification	Yes	Experiments were run in a cluster equipped with A100 GPUs, using a single GPU per run.
Software Dependencies	No	All experiments relied on Adam (Kingma & Ba, 2014) with a learning rate of 10 3 for p F and 10 2 for log Z for stochastic optimization (Madan et al., 2022).
Experiment Setup	Yes	All experiments relied on Adam (Kingma & Ba, 2014) with a learning rate of 10 3 for p F and 10 2 for log Z for stochastic optimization (Madan et al., 2022). (...) Hypergrid. (...) we use a batch size of 16 trajectories and train the model for 62500 epochs (106 trajectories). We parameterize the forward policy with a MLP composed of 2 128-dimensional layers. (...) To parameterize the policies of both LAand the standard GFlow Nets, we use a 3-layer GIN (Xu et al., 2019) having 32-dimensional layer, followed by an MLP of 2 32-dimensional layers.