reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Improving Exploration through Sibling Augmented GFlowNets

Authors: Kanika Madan, Alex Lamb, Emmanuel Bengio, Glen Berseth, Yoshua Bengio

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	An extensive set of experiments across a diverse range of tasks, reward structures and trajectory lengths, along with a thorough set of ablations, demonstrate the superior performance of SA-GFN in terms of exploration efficacy and convergence speed as compared to the existing methods. To evaluate SA-GFN against other baselines, we address the following research questions: 1. Number of discovered modes: We track the number of modes learnt by each method over a diverse range of task structures and reward settings; Section 5.1 and 5.2. 2. Learning of the true reward distribution: We measure the L1 error between the true reward distribution and the learnt empirical distribution for each method, Section 5.1. We also visualize the learnt empirical distributions at the end of the training to compare against the true reward; Section 10.7.
Researcher Affiliation	Collaboration	01 Mila Qu ebec AI Institute, Universit e de Montr eal, 2 Microsoft Research, 3 Valence Labs. Corresponding author: EMAIL
Pseudocode	Yes	Algorithm 1: Sibling Augmented Generative Flow Networks (SA-GFN)
Open Source Code	No	The paper does not explicitly state that the source code for SA-GFN is released. It mentions that some experimental settings are 'based on the published codebase of Malkin et al. (2022) and Pan et al. (2022)' or 'expand on the published code of Bengio et al. (2021a) and Malkin et al. (2022)', referring to other works' codebases, not its own.
Open Datasets	Yes	To evaluate on a wide range of exploration tasks, we conducted experiments on the following four domains... (c) Bit Sequence Task: from Malkin et al. (2022)... (d) Small Molecule Generation: from (Bengio et al., 2021a)...
Dataset Splits	Yes	For the Bit Sequence Generation task: 'the definition of modes M, set of test sequences, distance metric are the same as in (Malkin et al., 2022).' For Small Molecule Generation: 'The proxy model giving the reward, the held-out set of molecules used to compute the correlation metric, and the GFlow Net model architecture and its hyperparameters are taken from Bengio et al. (2021a) and Malkin et al. (2022).'
Hardware Specification	No	The paper mentions experimental details such as batch size, number of updates, learning rates, and optimizer (Adam), but does not specify any hardware like GPU or CPU models.
Software Dependencies	No	The paper mentions using 'Adam optimizer' and a 'Transformer based architecture (Vaswani et al., 2017)', but does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the implementation.
Experiment Setup	Yes	All models are trained with Adam optimizer with a batch size of 16 for a total of 20000 updates and 3 seeds. The learning rate is chosen from {0.001, 0.005, 0.01, 0.03} for the forward and backward policies PF and PB with the trajectory balance objective (Malkin et al., 2022), and the learning rate of the Zθ is 10 learning rates of PF and PB. Reward temperatures values of {βe BN = 1.0, βe SN = 0.25, βSN = 1.0, βBN = 1.0, βi = 1.0} are used. For intrinsic rewards, we choose RND rewards with the intrinsic reward coefficient chosen from {0.00005, 0.00001, 0.0005, 0.0001, 0.005, 0.001, 0.05, 0.01}.