Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization

Authors: Timofei Gritsaev, Nikita Morozov, Sergey Samsonov, Daniil Tiapkin

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide an extensive experimental evaluation of the proposed approach across various benchmarks in combination with both RL and GFlow Net algorithms and demonstrate its faster convergence and mode discovery in complex environments. We provide an extensive experimental evaluation of TLM in four tasks, confirming the findings of Mohammadpour et al. (2024), which emphasize the benefits of training the backward policy in a complex environment with less structure.
Researcher Affiliation Academia Timofei Gritsaev HSE University Constructor University, Bremen EMAIL Nikita Morozov HSE University EMAIL Sergey Samsonov HSE University EMAIL Daniil Tiapkin CMAP CNRS Ecole polytechnique Institut Polytechnique de Paris Universit e Paris-Saclay, CNRS, Laboratoire de math ematiques d Orsay EMAIL
Pseudocode Yes The complete procedure can be interpreted as a soft RL method with changing rewards. Our suggested method is summarized in Algorithm 1 and can be paired with any GFlow Net training method Alg (e.g., DB, TB, Sub TB, or Soft DQN). Algorithm 1 Trajectory Likelihood Maximization
Open Source Code Yes Source code: github.com/tgritsaev/gflownet-tlm.
Open Datasets Yes Our final experiments are carried out on molecule design tasks of s EH (Bengio et al., 2021) and QM9 (Jain et al., 2023). In both tasks, the goal is to generate molecular graphs, with reward emphasizing some desirable property. For both problems, we use pre-trained reward proxy neural networks. For the s EH task, the model is trained to predict the binding energy of a molecule to a particular protein target (soluble epoxide hydrolase) (Bengio et al., 2021). For the QM9 task, the proxy is trained on the QM9 dataset (Ramakrishnan et al., 2014) to predict the HOMO-LUMO gap (Zhang et al., 2020).
Dataset Splits Yes For s EH, we use the test set from Bengio et al. (2021). For QM9, we select a subset of 773 molecules from the QM9 dataset (Ramakrishnan et al., 2014) containing between 3 and 8 atoms. The subset is constructed to ensure an approximately equal representation of different molecule sizes.
Hardware Specification Yes Each bit sequence experiment was performed on a single NVIDIA V100 GPU. Each molecule generation experiment was conducted on a single NVIDIA A100 GPU. Hypergrid experiments were performed on CPUs.
Software Dependencies No We utilize Py Torch (Paszke et al., 2019) in our experiments. Explanation: The paper mentions PyTorch but does not specify its version number, nor other key software components with versions.
Experiment Setup Yes All models are parameterized using an MLP with 2 hidden layers and 256 hidden units. We use the Adam optimizer with a learning rate of 10 3 and a batch size of 16 trajectories. For Sub TB, we set λ = 0.9, following Madan et al. (2023). Tables 1, 2, and 3 further detail hyperparameters used across different experiments.