reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Evolution guided generative flow networks

Authors: Zarif Ikram, Ling Pan, Dianbo Liu

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a thorough investigation over a wide range of toy and real-world benchmark tasks showing the effectiveness of our method in handling long trajectories and sparse rewards. In this section, we validate EGFN for different synthetic and real-world tasks. Section 4.1 presents an investigation of EGFN s performance in long trajectory and sparse rewards, generalizability across multiple GFlow Nets objectives, and an ablation study on different components.
Researcher Affiliation	Academia	Zarif Ikram National University of Singapore Ling Pan The Hong Kong University of Science and Technology Dianbo Liu National University of Singapore
Pseudocode	Yes	Algorithm 1 Evolution Guided GFlow Net Training Input: P F : Forward flow of the star agent with weights θ pop F : Population of k agents with randomly initiated weights D : Prioritized replay buffer E: Number of episodes in an evaluation ϵ: percent of greedily selected elites δ: online-to-offline sample ratio γ: mutation strength
Open Source Code	Yes	We release the code at http://github.com/zarifikram/egfn.
Open Datasets	Yes	We first study the effectiveness of EGFN investigating the well-studied hypergrid task introduced by Bengio et al. (2021). We collect the chain pair from the observed antibody space (OAS) database. Using this MDP, GFlow Nets agent actions prepend or append to the nucleotide string. The reward is a DNA binding affinity to a human transcription factor provided by Trabucco et al. (2022). In this experiment, we generate a small molecule graph based on the QM9 data (Ramakrishnan et al., 2014) that maximizes the energy gap between its HOMO and LUMO orbitals, thereby increasing its stability. To further show the robustness, here we consider the task of generating CDR mutants on a hu4D5 antibody mutant dataset (Mason et al., 2021).
Dataset Splits	Yes	To classify the generated samples, we train a binary classifier that achieves 85% accuracy on an IID validation set.
Hardware Specification	Yes	For example, we report the runtime analysis on the QM9 task (appendix D.3) in table 2, which is performed with an Intel Xeon Processor (Skylake, IBRS) with 512 GB of RAM and a single A100 NVIDIA GPU.
Software Dependencies	No	All our implementation code uses the Py Torch library (Paszke et al., 2019). We label the training sequences using Bio Python (Cock et al., 2009). The paper mentions PyTorch and Bio Python libraries with citations, but does not provide specific version numbers for these software components.
Experiment Setup	Yes	For all the following experiments, we use k = 5, E = 4, ϵ = 0.2, and γ = 1. Architecture We model the forward layer with a 3-layer MLP with 256 hidden dimensions, followed by a leaky Re LU. The forward layer takes the one-hot encoding of the states as inputs and outputs action logits. For FM, we simply use the forward layer to model the edge flow. For TB and DB, we double the action space and train the MLP as both the forward and backward flow. We use a learning rate of 10 4 for FM and 10 3 for both TB and DB, including a learning rate of 0.1 for Zθ. The replay buffer uses a maximum size of 1000, and we use a worst-reward first policy for replay replacement. We detail the summary of the training hyperparameters in table 4.