Investigating Generalization Behaviours of Generative Flow Networks

Authors: Lazar Atanackovic, Emmanuel Bengio

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we empirically verify some of the hypothesized mechanisms of generalization of GFlow Nets. We accomplish this by introducing a novel graph-based benchmark environment where reward difficulty can be easily varied, p(x) can be computed exactly, and an unseen test set can be constructed to quantify generalization performance. Using this graph-based environment, we are able to systematically test the hypothesized mechanisms of generalization of GFlow Nets and put forth a set of empirical observations that summarize our findings.
Researcher Affiliation Collaboration Lazar Atanackovic EMAIL University of Toronto Vector Institute Emmanuel Bengio EMAIL Valence Labs
Pseudocode No The paper describes the methodology in prose and refers to mathematical equations, but does not include any clearly labeled pseudocode or algorithm blocks. The Python code snippets in Section C.1.1 define reward functions, not the main algorithms.
Open Source Code Yes Our code is available at: https://github.com/lazaratan/gflownet-generalization
Open Datasets Yes To conduct our empirical investigation, we propose a set of new graph-based generation tasks to benchmark the performance of GFlow Nets for learning unormalized probability mass functions over discrete spaces. ... We define tasks of varying difficulty on a fixed state space, thus holding the environment constant while the reward difficulty is varied. For completeness, we also conduct experiments on two common benchmark tasks in GFlow Net literature: the hypergrid and sequence tasks. ... Reward complexity: We define three different reward functions, which we hope to be of varying difficulty. ... We fully define and show the distribution of log R(x) of the respective tasks in C.1. ... Section C.1.1 contains Python code for the reward functions (cliques, neighbors, counting).
Dataset Splits Yes We use a 90%-10% train-test split, and show the resulting test error in Figure 1(a). ... We use a 90%-10% train-test split. ... We conduct all experiments reported in this section over 3 random seeds.
Hardware Specification Yes All experiments were run on an HPC cluster of NVIDIA A100 100GB GPUs for a total of approximately 2000 GPU hours. Only 1 GPU is required for Each individual seed run of an experiment, typically taking between 24 hours to 3 days to complete, depending on the experiment.
Software Dependencies No Our experiments are implemented in Pytorch and Pytorch Geometric. ... We use the Adam optimizer with learning rate 0.0001 for all models. The paper mentions software tools like Pytorch, Pytorch Geometric, networkx, numpy, and Adam optimizer but does not specify their version numbers.
Experiment Setup Yes For all graph experiments we use a modified graph transformer (Veličković et al., 2017; Shi et al., 2020). ... We use 8 layers with 128-dimensional embeddings and 4 attention heads... For sequence tasks, we use a vanilla transformer (Vaswani et al., 2017) with 4 layers of 64 embeddings and 2 attention heads. ... For grid tasks we use a Leaky Re LU MLP with 3 layers of 128 units. ... We use the Adam optimizer with learning rate 0.0001 for all models. ... We conduct all experiments reported in this section over 3 random seeds. For training online and offline GFlow Nets, we use Sub TB(1) (see D.2) and a uniform PB.