reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization

Authors: Timofei Gritsaev, Nikita Morozov, Sergey Samsonov, Daniil Tiapkin

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide an extensive experimental evaluation of the proposed approach across various benchmarks in combination with both RL and GFlow Net algorithms and demonstrate its faster convergence and mode discovery in complex environments. We provide an extensive experimental evaluation of TLM in four tasks, confirming the findings of Mohammadpour et al. (2024), which emphasize the benefits of training the backward policy in a complex environment with less structure.
Researcher Affiliation	Academia	Timofei Gritsaev HSE University Constructor University, Bremen EMAIL Nikita Morozov HSE University EMAIL Sergey Samsonov HSE University EMAIL Daniil Tiapkin CMAP CNRS Ecole polytechnique Institut Polytechnique de Paris Universit e Paris-Saclay, CNRS, Laboratoire de math ematiques d Orsay EMAIL
Pseudocode	Yes	The complete procedure can be interpreted as a soft RL method with changing rewards. Our suggested method is summarized in Algorithm 1 and can be paired with any GFlow Net training method Alg (e.g., DB, TB, Sub TB, or Soft DQN). Algorithm 1 Trajectory Likelihood Maximization
Open Source Code	Yes	Source code: github.com/tgritsaev/gflownet-tlm.
Open Datasets	Yes	Our final experiments are carried out on molecule design tasks of s EH (Bengio et al., 2021) and QM9 (Jain et al., 2023). In both tasks, the goal is to generate molecular graphs, with reward emphasizing some desirable property. For both problems, we use pre-trained reward proxy neural networks. For the s EH task, the model is trained to predict the binding energy of a molecule to a particular protein target (soluble epoxide hydrolase) (Bengio et al., 2021). For the QM9 task, the proxy is trained on the QM9 dataset (Ramakrishnan et al., 2014) to predict the HOMO-LUMO gap (Zhang et al., 2020).
Dataset Splits	Yes	For s EH, we use the test set from Bengio et al. (2021). For QM9, we select a subset of 773 molecules from the QM9 dataset (Ramakrishnan et al., 2014) containing between 3 and 8 atoms. The subset is constructed to ensure an approximately equal representation of different molecule sizes.
Hardware Specification	Yes	Each bit sequence experiment was performed on a single NVIDIA V100 GPU. Each molecule generation experiment was conducted on a single NVIDIA A100 GPU. Hypergrid experiments were performed on CPUs.
Software Dependencies	No	We utilize Py Torch (Paszke et al., 2019) in our experiments. Explanation: The paper mentions PyTorch but does not specify its version number, nor other key software components with versions.
Experiment Setup	Yes	All models are parameterized using an MLP with 2 hidden layers and 256 hidden units. We use the Adam optimizer with a learning rate of 10 3 and a batch size of 16 trajectories. For Sub TB, we set λ = 0.9, following Madan et al. (2023). Tables 1, 2, and 3 further detail hyperparameters used across different experiments.