Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Authors: Timofei Gritsaev, Nikita Morozov, Sergey Samsonov, Daniil Tiapkin
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide an extensive experimental evaluation of the proposed approach across various benchmarks in combination with both RL and GFlow Net algorithms and demonstrate its faster convergence and mode discovery in complex environments. We provide an extensive experimental evaluation of TLM in four tasks, confirming the findings of Mohammadpour et al. (2024), which emphasize the benefits of training the backward policy in a complex environment with less structure. |
| Researcher Affiliation | Academia | Timofei Gritsaev HSE University Constructor University, Bremen EMAIL Nikita Morozov HSE University EMAIL Sergey Samsonov HSE University EMAIL Daniil Tiapkin CMAP CNRS Ecole polytechnique Institut Polytechnique de Paris Universit e Paris-Saclay, CNRS, Laboratoire de math ematiques d Orsay EMAIL |
| Pseudocode | Yes | The complete procedure can be interpreted as a soft RL method with changing rewards. Our suggested method is summarized in Algorithm 1 and can be paired with any GFlow Net training method Alg (e.g., DB, TB, Sub TB, or Soft DQN). Algorithm 1 Trajectory Likelihood Maximization |
| Open Source Code | Yes | Source code: github.com/tgritsaev/gflownet-tlm. |
| Open Datasets | Yes | Our final experiments are carried out on molecule design tasks of s EH (Bengio et al., 2021) and QM9 (Jain et al., 2023). In both tasks, the goal is to generate molecular graphs, with reward emphasizing some desirable property. For both problems, we use pre-trained reward proxy neural networks. For the s EH task, the model is trained to predict the binding energy of a molecule to a particular protein target (soluble epoxide hydrolase) (Bengio et al., 2021). For the QM9 task, the proxy is trained on the QM9 dataset (Ramakrishnan et al., 2014) to predict the HOMO-LUMO gap (Zhang et al., 2020). |
| Dataset Splits | Yes | For s EH, we use the test set from Bengio et al. (2021). For QM9, we select a subset of 773 molecules from the QM9 dataset (Ramakrishnan et al., 2014) containing between 3 and 8 atoms. The subset is constructed to ensure an approximately equal representation of different molecule sizes. |
| Hardware Specification | Yes | Each bit sequence experiment was performed on a single NVIDIA V100 GPU. Each molecule generation experiment was conducted on a single NVIDIA A100 GPU. Hypergrid experiments were performed on CPUs. |
| Software Dependencies | No | We utilize Py Torch (Paszke et al., 2019) in our experiments. Explanation: The paper mentions PyTorch but does not specify its version number, nor other key software components with versions. |
| Experiment Setup | Yes | All models are parameterized using an MLP with 2 hidden layers and 256 hidden units. We use the Adam optimizer with a learning rate of 10 3 and a batch size of 16 trajectories. For Sub TB, we set λ = 0.9, following Madan et al. (2023). Tables 1, 2, and 3 further detail hyperparameters used across different experiments. |