Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets

Authors: Haoran He, Can Chang, Huazhe Xu, Ling Pan

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical results show that our method improves sample efficiency by a large margin and outperforms strong baselines on various standard evaluation benchmarks. Our codes are available at https://github.com/tinnerhrhe/Goal-Conditioned-GFN.
Researcher Affiliation Academia Haoran He1, Can Chang2, Huazhe Xu2, Ling Pan1 1Hong Kong University of Science and Technology 2Tsinghua University Correspondence to: Ling Pan (EMAIL).
Pseudocode Yes Algorithm 1 Retrospective Backward Synthesis GFlow Nets
Open Source Code Yes Our codes are available at https://github.com/tinnerhrhe/Goal-Conditioned-GFN.
Open Datasets No We first conduct a series of experiments based on the Grid World environment (Bengio et al., 2021), in which the model learns to achieve any given goals starting in a H H grid. In this section, we investigate the performance of RBS-GFN in the bit sequence generation task (Malkin et al., 2022). In this section, we study a more practical task of generating DNA sequences with high binding activity with targeted transcription factors (Jain et al., 2022). In this section, we study the antimicrobial peptides (AMP) (Jain et al., 2022) generation task for investigating the scalability of our proposed method.
Dataset Splits No The paper does not explicitly provide specific percentages, sample counts, or predefined citations for training, test, and validation dataset splits. While it mentions
Hardware Specification Yes We run all the experiments in this paper on an RTX 3090 machine.
Software Dependencies No The paper mentions using 'Adam (Kingma & Ba, 2014) optimizer' and 'ReLU activation (Xu et al., 2015)' as well as building upon 'publicly available open-source repositories' including 'https://github.com/GFNOrg/gflownet'. However, it does not provide specific version numbers for software components like Python, PyTorch, or CUDA, which are necessary for reproducible dependency information.
Experiment Setup Yes We use an MLP network that consists of 2 hidden layers with 2048 hidden units and ReLU activation (Xu et al., 2015). The trajectories are sampled from a parallel of 16 rollouts in the environment at each training step. We set the replay buffer size as 1e6 and use a batch size of 128 for sampling data and computing loss function. We use the Adam (Kingma & Ba, 2014) optimizer with a learning rate of 1e-3 for 2e4 training steps (Grid World), 5e-4 for 1e5 training steps (Bit Sequence Generation), 5e-4 for 5e3 training steps (TF Bind Generation), and 5e-4 for 1e5 training steps (AMP Generation). In practice, we set C = 1e7 for small task, C = 1e25 for medium task, and C = 1e40 for large task. We run each algorithm with three different seeds and report their performance in mean and standard deviation.