reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

State Combinatorial Generalization In Decision Making With Conditional Diffusion Models

Authors: Xintong Duan, Yutong He, Fahim Tajwar, Wentse Chen, Ruslan Salakhutdinov, Jeff Schneider

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments in maze, driving, and multiagent environments, we show that conditioned diffusion models outperform traditional RL techniques and highlight the broad applicability of our problem formulation.
Researcher Affiliation	Academia	Carnegie Mellon University EMAIL
Pseudocode	Yes	Algorithm 1 Planning with Attention-based Composition Conditioned Diffusion Model
Open Source Code	No	The paper uses third-party codebases like 'stable_baseline3 Raffin et al. (2021)' for PPO and 'Open RL Huang et al. (2023)' for MAPPO, but does not include any explicit statement or link for the release of their own implementation code for the methodology described.
Open Datasets	Yes	Experimentally, we evaluate the models on three distinct different RL environments: maze, driving, and multiagent games. All three settings are easily adaptable to the OOC generalization problem using existing RL frameworks, demonstrating the broad applicability of the combinatorial state setup. ... Highway Env (Leurent, 2018) is a self-driving environment ... The Star Craft Multi-Agent Challenge (SMAC/SMACv2) (Samvelyan et al., 2019; Ellis et al., 2022) is a multi-agent collaborative game ... In Maze2D (Fu et al., 2020)
Dataset Splits	Yes	During training time, the environment will only generate traffic of all cars or all bicycles with equal probability. During test time, environments will generate a mixture of cars and bicycles (detailed setup in Appendix C.7). ... We evaluate on two OOC scenarios: (1) (Simple: Different but overlapping support): Train the model on randomly generated combinations (ABC) of all units and test it where all the units on the team have same type (AAA), (2) (Hard: Non-overlapping support): the opposite scenario, where we train on teams with only one unit type (AAA), but during test-time we see any composition of these three units (ABC).
Hardware Specification	Yes	Experiments are run on a single NVIDIA RTX A6000 GPUs, with all code implemented in Py Torch.
Software Dependencies	No	The paper mentions 'Py Torch' as the implementation language and refers to 'stable_baseline3 Raffin et al. (2021)' and 'Open RL Huang et al. (2023)' for specific implementations (PPO and MAPPO), but it does not provide specific version numbers for these software components, which are required for a reproducible description.
Experiment Setup	Yes	The hyperparameters shared for large and medium mazes are shown below in Table 1. ... Detailed parameters for PPO and diffusion are shown below in Table 3 and Table 4. ... Detailed parameters for MAPPO can be found in Table 6. ... Detailed parameters for training a conditioned diffusion model for 5v5 are shown below in Table 7 and 3v3 in Table 8.