reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

What Makes a Good Diffusion Planner for Decision Making?

Authors: Haofei Lu, Dongqi Han, Yifei Shen, Dongsheng Li

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We trained and evaluated over 6,000 diffusion models, identifying the critical components such as guided sampling, network architecture, action generation and planning strategy. We conducted an extensive empirical study to explore what constitutes an effective diffusion planner. By training and evaluating over 6,000 models, we analyzed key components critical to decision making in diffusion planning, including guided sampling algorithms, network architectures, action generation methods, and planning strategies. We conducted experiments on the D4RL dataset (Fu et al., 2020), one of the most widely used benchmarks for offline RL and imitation learning.
Researcher Affiliation	Collaboration	Haofei Lu1 Dongqi Han2 Yifei Shen2 Dongsheng Li2 1Tsinghua University 2Microsoft Research Asia EMAIL EMAIL
Pseudocode	Yes	Algorithm 1: Diffusion Veteran (DV) Simplified Pseudocode
Open Source Code	Yes	We include the source code of DV in the supplementary material, which is also avaliable at https://github.com/Josh00-Lu/DiffusionVeteran.
Open Datasets	Yes	We conducted experiments on the D4RL dataset (Fu et al., 2020), one of the most widely used benchmarks for offline RL and imitation learning. To examine whether the conclusions drawn from our experiments can generalize to other tasks, we conducted experiments on the Adroit Hand dataset (Rajeswaran et al., 2018; Fu et al., 2020).
Dataset Splits	Yes	The Cloned dataset consists of a 50-50 split between demonstration data and 2,500 trajectories sampled from a behaviorally cloned policy trained on these demonstrations. The demonstration data includes 25 human trajectories, which are duplicated 100 times to match the number of cloned trajectories. The Expert dataset comprises 5,000 trajectories sampled from an expert policy that successfully solves the task, as provided in the DAPG repository.
Hardware Specification	No	The paper mentions 'significant computational resources, particularly in terms of GPU energy consumption' but does not specify any particular GPU model, CPU, or other hardware details used for the experiments.
Software Dependencies	No	The paper mentions software components like 'Clean Diffuser (Dong et al., 2024b)', 'Adam (Kingma & Ba, 2014) optimizer', 'DDIM (Song et al., 2020)', and 'DDPM'. However, it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	All the planner diffusion models are trained with the Adam (Kingma & Ba, 2014) optimizer with learning rate of 3e 4, batch size of 128, for 1M gradient steps. Table 2: Configuration Settings provides a detailed list of hyperparameters and default choices including 'Guidance Type', 'Planning Horizon', 'Planning Stride', 'Planner Training Steps', 'Planner Temperature', 'MCSS Candidates', 'Learning Rate', 'Batch Size', etc.