reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DIME: Diffusion-Based Maximum Entropy Reinforcement Learning

Authors: Onur Celik, Zechu Li, Denis Blessing, Ge Li, Daniel Palenicek, Jan Peters, Georgia Chalvatzaki, Gerhard Neumann

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On 13 challenging continuous high-dimensional control benchmarks, we empirically validate that DIME significantly outperforms other diffusion-based methods on all environments and consistently outperforms other state-of-the-art non-diffusion based RL methods while requiring fewer algorithmic design choices and smaller update-to-data ratios, reducing computational complexity1.
Researcher Affiliation	Academia	1Autonomous Learning Robots, KIT 2Interactive Robot Perception & Learning, TU Darmstadt 3Intelligent Autonomous Systems, TU Darmstadt 4Hessian.AI 5German Research Center for AI 6Centre for Cognitivie Science, TU Darmstadt . Correspondence to: Onur Celik <EMAIL>.
Pseudocode	Yes	Algorithm 1 DIME: Diffusion-Based Maximum Entropy Reinforcement Learning
Open Source Code	Yes	1https://alrhub.github.io/dime-website/
Open Datasets	Yes	We consider a broad range of 13 sophisticated learning environments from different benchmark suits, ranging from mujoco gym (Brockman et al., 2016), deepmind control suit (DMC) (Tunyasuvunakool et al., 2020), and myo suite (Caggiano et al., 2022)
Dataset Splits	No	The paper uses standard benchmark suites (mujoco gym, deepmind control suite, myo suite) which have defined environments for training and evaluation. However, it does not explicitly provide information about how data collected from these environments is split into training, validation, or test sets in the traditional supervised learning sense. It mentions running experiments with "10 seeds" but no specific dataset split percentages or methodologies.
Hardware Specification	Yes	The number of diffusion steps might affect the performance and the computation time. (a) shows DIME s learning curves for varying diffusion steps. Two diffusion steps perform badly, whereas four and eight diffusion steps perform similar but still worse than 16 and 32 diffusion steps which perform similarly. (b) shows the computation time for 1MIO steps of the corresponding learning curves. The smaller the diffusion steps, the less computation time is required. Learning Curves on Gym Benchmark Suite (c)-(d). We compare DIME against various diffusion baselines and Cross Q on the (c) Ant-v3 and (d) Humanoid-v3 from the Gym suite. While all diffusion-based methods are outperformed by DIME, DIME performs on par with Cross Q on the Ant environment. DIME performs favorably on the high-dimensional Humanoid-v3 environment, where it also outperforms Cross Q.
Software Dependencies	No	The paper mentions using specific algorithms and components (e.g., "Cross Q algorithm", "Adam optimizer", "Batch Renormalization") but does not provide specific version numbers for any software libraries or dependencies used (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	Yes	Table 2. Hyperparameters of DIME and all diffusion-based algorithms for all benchmark suits. Varying hyperparameters for different benchmark suits are described in the text. Table 3. Hyperparameters of DIME and Gaussian-based algorithms for all benchmark suits. Varying hyperparameters for different benchmark suits are described in the text.