reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Investigating Generalization Behaviours of Generative Flow Networks

Authors: Lazar Atanackovic, Emmanuel Bengio

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we empirically verify some of the hypothesized mechanisms of generalization of GFlow Nets. We accomplish this by introducing a novel graph-based benchmark environment where reward difficulty can be easily varied, p(x) can be computed exactly, and an unseen test set can be constructed to quantify generalization performance. Using this graph-based environment, we are able to systematically test the hypothesized mechanisms of generalization of GFlow Nets and put forth a set of empirical observations that summarize our findings.
Researcher Affiliation	Collaboration	Lazar Atanackovic EMAIL University of Toronto Vector Institute Emmanuel Bengio EMAIL Valence Labs
Pseudocode	No	The paper describes the methodology in prose and refers to mathematical equations, but does not include any clearly labeled pseudocode or algorithm blocks. The Python code snippets in Section C.1.1 define reward functions, not the main algorithms.
Open Source Code	Yes	Our code is available at: https://github.com/lazaratan/gflownet-generalization
Open Datasets	Yes	To conduct our empirical investigation, we propose a set of new graph-based generation tasks to benchmark the performance of GFlow Nets for learning unormalized probability mass functions over discrete spaces. ... We define tasks of varying difficulty on a fixed state space, thus holding the environment constant while the reward difficulty is varied. For completeness, we also conduct experiments on two common benchmark tasks in GFlow Net literature: the hypergrid and sequence tasks. ... Reward complexity: We define three different reward functions, which we hope to be of varying difficulty. ... We fully define and show the distribution of log R(x) of the respective tasks in C.1. ... Section C.1.1 contains Python code for the reward functions (cliques, neighbors, counting).
Dataset Splits	Yes	We use a 90%-10% train-test split, and show the resulting test error in Figure 1(a). ... We use a 90%-10% train-test split. ... We conduct all experiments reported in this section over 3 random seeds.
Hardware Specification	Yes	All experiments were run on an HPC cluster of NVIDIA A100 100GB GPUs for a total of approximately 2000 GPU hours. Only 1 GPU is required for Each individual seed run of an experiment, typically taking between 24 hours to 3 days to complete, depending on the experiment.
Software Dependencies	No	Our experiments are implemented in Pytorch and Pytorch Geometric. ... We use the Adam optimizer with learning rate 0.0001 for all models. The paper mentions software tools like Pytorch, Pytorch Geometric, networkx, numpy, and Adam optimizer but does not specify their version numbers.
Experiment Setup	Yes	For all graph experiments we use a modified graph transformer (Veličković et al., 2017; Shi et al., 2020). ... We use 8 layers with 128-dimensional embeddings and 4 attention heads... For sequence tasks, we use a vanilla transformer (Vaswani et al., 2017) with 4 layers of 64 embeddings and 2 attention heads. ... For grid tasks we use a Leaky Re LU MLP with 3 layers of 128 units. ... We use the Adam optimizer with learning rate 0.0001 for all models. ... We conduct all experiments reported in this section over 3 random seeds. For training online and offline GFlow Nets, we use Sub TB(1) (see D.2) and a uniform PB.