What Makes a Good Diffusion Planner for Decision Making?

Authors: Haofei Lu, Dongqi Han, Yifei Shen, Dongsheng Li

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We trained and evaluated over 6,000 diffusion models, identifying the critical components such as guided sampling, network architecture, action generation and planning strategy. We conducted an extensive empirical study to explore what constitutes an effective diffusion planner. By training and evaluating over 6,000 models, we analyzed key components critical to decision making in diffusion planning, including guided sampling algorithms, network architectures, action generation methods, and planning strategies. We conducted experiments on the D4RL dataset (Fu et al., 2020), one of the most widely used benchmarks for offline RL and imitation learning.
Researcher Affiliation Collaboration Haofei Lu1 Dongqi Han2 Yifei Shen2 Dongsheng Li2 1Tsinghua University 2Microsoft Research Asia EMAIL EMAIL
Pseudocode Yes Algorithm 1: Diffusion Veteran (DV) Simplified Pseudocode
Open Source Code Yes We include the source code of DV in the supplementary material, which is also avaliable at https://github.com/Josh00-Lu/DiffusionVeteran.
Open Datasets Yes We conducted experiments on the D4RL dataset (Fu et al., 2020), one of the most widely used benchmarks for offline RL and imitation learning. To examine whether the conclusions drawn from our experiments can generalize to other tasks, we conducted experiments on the Adroit Hand dataset (Rajeswaran et al., 2018; Fu et al., 2020).
Dataset Splits Yes The Cloned dataset consists of a 50-50 split between demonstration data and 2,500 trajectories sampled from a behaviorally cloned policy trained on these demonstrations. The demonstration data includes 25 human trajectories, which are duplicated 100 times to match the number of cloned trajectories. The Expert dataset comprises 5,000 trajectories sampled from an expert policy that successfully solves the task, as provided in the DAPG repository.
Hardware Specification No The paper mentions 'significant computational resources, particularly in terms of GPU energy consumption' but does not specify any particular GPU model, CPU, or other hardware details used for the experiments.
Software Dependencies No The paper mentions software components like 'Clean Diffuser (Dong et al., 2024b)', 'Adam (Kingma & Ba, 2014) optimizer', 'DDIM (Song et al., 2020)', and 'DDPM'. However, it does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes All the planner diffusion models are trained with the Adam (Kingma & Ba, 2014) optimizer with learning rate of 3e 4, batch size of 128, for 1M gradient steps. Table 2: Configuration Settings provides a detailed list of hyperparameters and default choices including 'Guidance Type', 'Planning Horizon', 'Planning Stride', 'Planner Training Steps', 'Planner Temperature', 'MCSS Candidates', 'Learning Rate', 'Batch Size', etc.