State Combinatorial Generalization In Decision Making With Conditional Diffusion Models
Authors: Xintong Duan, Yutong He, Fahim Tajwar, Wentse Chen, Ruslan Salakhutdinov, Jeff Schneider
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments in maze, driving, and multiagent environments, we show that conditioned diffusion models outperform traditional RL techniques and highlight the broad applicability of our problem formulation. |
| Researcher Affiliation | Academia | Carnegie Mellon University EMAIL |
| Pseudocode | Yes | Algorithm 1 Planning with Attention-based Composition Conditioned Diffusion Model |
| Open Source Code | No | The paper uses third-party codebases like 'stable_baseline3 Raffin et al. (2021)' for PPO and 'Open RL Huang et al. (2023)' for MAPPO, but does not include any explicit statement or link for the release of their own implementation code for the methodology described. |
| Open Datasets | Yes | Experimentally, we evaluate the models on three distinct different RL environments: maze, driving, and multiagent games. All three settings are easily adaptable to the OOC generalization problem using existing RL frameworks, demonstrating the broad applicability of the combinatorial state setup. ... Highway Env (Leurent, 2018) is a self-driving environment ... The Star Craft Multi-Agent Challenge (SMAC/SMACv2) (Samvelyan et al., 2019; Ellis et al., 2022) is a multi-agent collaborative game ... In Maze2D (Fu et al., 2020) |
| Dataset Splits | Yes | During training time, the environment will only generate traffic of all cars or all bicycles with equal probability. During test time, environments will generate a mixture of cars and bicycles (detailed setup in Appendix C.7). ... We evaluate on two OOC scenarios: (1) (Simple: Different but overlapping support): Train the model on randomly generated combinations (ABC) of all units and test it where all the units on the team have same type (AAA), (2) (Hard: Non-overlapping support): the opposite scenario, where we train on teams with only one unit type (AAA), but during test-time we see any composition of these three units (ABC). |
| Hardware Specification | Yes | Experiments are run on a single NVIDIA RTX A6000 GPUs, with all code implemented in Py Torch. |
| Software Dependencies | No | The paper mentions 'Py Torch' as the implementation language and refers to 'stable_baseline3 Raffin et al. (2021)' and 'Open RL Huang et al. (2023)' for specific implementations (PPO and MAPPO), but it does not provide specific version numbers for these software components, which are required for a reproducible description. |
| Experiment Setup | Yes | The hyperparameters shared for large and medium mazes are shown below in Table 1. ... Detailed parameters for PPO and diffusion are shown below in Table 3 and Table 4. ... Detailed parameters for MAPPO can be found in Table 6. ... Detailed parameters for training a conditioned diffusion model for 5v5 are shown below in Table 7 and 3v3 in Table 8. |