Towards Improving Exploration through Sibling Augmented GFlowNets
Authors: Kanika Madan, Alex Lamb, Emmanuel Bengio, Glen Berseth, Yoshua Bengio
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | An extensive set of experiments across a diverse range of tasks, reward structures and trajectory lengths, along with a thorough set of ablations, demonstrate the superior performance of SA-GFN in terms of exploration efficacy and convergence speed as compared to the existing methods. To evaluate SA-GFN against other baselines, we address the following research questions: 1. Number of discovered modes: We track the number of modes learnt by each method over a diverse range of task structures and reward settings; Section 5.1 and 5.2. 2. Learning of the true reward distribution: We measure the L1 error between the true reward distribution and the learnt empirical distribution for each method, Section 5.1. We also visualize the learnt empirical distributions at the end of the training to compare against the true reward; Section 10.7. |
| Researcher Affiliation | Collaboration | 01 Mila Qu ebec AI Institute, Universit e de Montr eal, 2 Microsoft Research, 3 Valence Labs. Corresponding author: EMAIL |
| Pseudocode | Yes | Algorithm 1: Sibling Augmented Generative Flow Networks (SA-GFN) |
| Open Source Code | No | The paper does not explicitly state that the source code for SA-GFN is released. It mentions that some experimental settings are 'based on the published codebase of Malkin et al. (2022) and Pan et al. (2022)' or 'expand on the published code of Bengio et al. (2021a) and Malkin et al. (2022)', referring to other works' codebases, not its own. |
| Open Datasets | Yes | To evaluate on a wide range of exploration tasks, we conducted experiments on the following four domains... (c) Bit Sequence Task: from Malkin et al. (2022)... (d) Small Molecule Generation: from (Bengio et al., 2021a)... |
| Dataset Splits | Yes | For the Bit Sequence Generation task: 'the definition of modes M, set of test sequences, distance metric are the same as in (Malkin et al., 2022).' For Small Molecule Generation: 'The proxy model giving the reward, the held-out set of molecules used to compute the correlation metric, and the GFlow Net model architecture and its hyperparameters are taken from Bengio et al. (2021a) and Malkin et al. (2022).' |
| Hardware Specification | No | The paper mentions experimental details such as batch size, number of updates, learning rates, and optimizer (Adam), but does not specify any hardware like GPU or CPU models. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and a 'Transformer based architecture (Vaswani et al., 2017)', but does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the implementation. |
| Experiment Setup | Yes | All models are trained with Adam optimizer with a batch size of 16 for a total of 20000 updates and 3 seeds. The learning rate is chosen from {0.001, 0.005, 0.01, 0.03} for the forward and backward policies PF and PB with the trajectory balance objective (Malkin et al., 2022), and the learning rate of the Zθ is 10 learning rates of PF and PB. Reward temperatures values of {βe BN = 1.0, βe SN = 0.25, βSN = 1.0, βBN = 1.0, βi = 1.0} are used. For intrinsic rewards, we choose RND rewards with the intrinsic reward coefficient chosen from {0.00005, 0.00001, 0.0005, 0.0001, 0.005, 0.001, 0.05, 0.01}. |