Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Action abstractions for amortized sampling

Authors: Oussama Boussif, Léna Ezzine, Joseph Viviano, Michał Koziarski, Moksh Jain, Nikolay Malkin, Emmanuel Bengio, Rim Assouel, Yoshua Bengio

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In empirical evaluation on synthetic and real-world environments, our approach demonstrates improved sample efficiency performance in discovering diverse high-reward objects, especially on harder exploration problems. We evaluate our approach on multiple environments and determine that it accelerates mode discovery, improves density estimation, and reduces description length of the samples.
Researcher Affiliation Collaboration 1Mila Qu ebec AI Institute 2Universit e de Montr eal 3University of Toronto 4Valence Labs, Recursion 5University of Edinburgh 6CIFAR AI Chair
Pseudocode Yes Algorithm 1 Training policies with chunking
Open Source Code Yes Code is available at https://github.com/GFNOrg/Chunk-GFN.
Open Datasets Yes RNA sequence generation (L14 RNA1; Sinai et al., 2020)
Dataset Splits No The paper describes generating action sequences and using a replay buffer as part of its training strategy, but does not specify fixed training/test/validation dataset splits or predefined splits from external datasets for evaluation.
Hardware Specification No All runs use a single GPU with runs taking up to 36 hours maximum. The research was enabled by computational resources provided by the Digital Research Alliance of Canada (https://alliancecan.ca), Mila (https://mila.quebec), and NVIDIA.
Software Dependencies No The paper does not explicitly state specific software dependencies with version numbers, such as Python, PyTorch, or CUDA versions. It refers to Appendix B for implementation details but these details do not include software versions.
Experiment Setup Yes We run the samplers for a total of 31250 iterations and a batch size of 64, adding up to a total of 2 million visited states during training. The forward policy... has a learning rate of 10-4 whereas for SAC, it is 3x10-4. For GFlow Net, we use an initial value for the learnable log-partition value of 90 and a learning rate of 10-3. SAC uses an entropy coefficient of 0.2 and A2C and Option-Critic use an entropy coefficient of 0.5. We perform chunking every 1250 iterations to get 25 chunks at the end of training.