State Combinatorial Generalization In Decision Making With Conditional Diffusion Models

Authors: Xintong Duan, Yutong He, Fahim Tajwar, Wentse Chen, Ruslan Salakhutdinov, Jeff Schneider

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments in maze, driving, and multiagent environments, we show that conditioned diffusion models outperform traditional RL techniques and highlight the broad applicability of our problem formulation.
Researcher Affiliation Academia Carnegie Mellon University EMAIL
Pseudocode Yes Algorithm 1 Planning with Attention-based Composition Conditioned Diffusion Model
Open Source Code No The paper uses third-party codebases like 'stable_baseline3 Raffin et al. (2021)' for PPO and 'Open RL Huang et al. (2023)' for MAPPO, but does not include any explicit statement or link for the release of their own implementation code for the methodology described.
Open Datasets Yes Experimentally, we evaluate the models on three distinct different RL environments: maze, driving, and multiagent games. All three settings are easily adaptable to the OOC generalization problem using existing RL frameworks, demonstrating the broad applicability of the combinatorial state setup. ... Highway Env (Leurent, 2018) is a self-driving environment ... The Star Craft Multi-Agent Challenge (SMAC/SMACv2) (Samvelyan et al., 2019; Ellis et al., 2022) is a multi-agent collaborative game ... In Maze2D (Fu et al., 2020)
Dataset Splits Yes During training time, the environment will only generate traffic of all cars or all bicycles with equal probability. During test time, environments will generate a mixture of cars and bicycles (detailed setup in Appendix C.7). ... We evaluate on two OOC scenarios: (1) (Simple: Different but overlapping support): Train the model on randomly generated combinations (ABC) of all units and test it where all the units on the team have same type (AAA), (2) (Hard: Non-overlapping support): the opposite scenario, where we train on teams with only one unit type (AAA), but during test-time we see any composition of these three units (ABC).
Hardware Specification Yes Experiments are run on a single NVIDIA RTX A6000 GPUs, with all code implemented in Py Torch.
Software Dependencies No The paper mentions 'Py Torch' as the implementation language and refers to 'stable_baseline3 Raffin et al. (2021)' and 'Open RL Huang et al. (2023)' for specific implementations (PPO and MAPPO), but it does not provide specific version numbers for these software components, which are required for a reproducible description.
Experiment Setup Yes The hyperparameters shared for large and medium mazes are shown below in Table 1. ... Detailed parameters for PPO and diffusion are shown below in Table 3 and Table 4. ... Detailed parameters for MAPPO can be found in Table 6. ... Detailed parameters for training a conditioned diffusion model for 5v5 are shown below in Table 7 and 3v3 in Table 8.