DIME: Diffusion-Based Maximum Entropy Reinforcement Learning

Authors: Onur Celik, Zechu Li, Denis Blessing, Ge Li, Daniel Palenicek, Jan Peters, Georgia Chalvatzaki, Gerhard Neumann

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On 13 challenging continuous high-dimensional control benchmarks, we empirically validate that DIME significantly outperforms other diffusion-based methods on all environments and consistently outperforms other state-of-the-art non-diffusion based RL methods while requiring fewer algorithmic design choices and smaller update-to-data ratios, reducing computational complexity1.
Researcher Affiliation Academia 1Autonomous Learning Robots, KIT 2Interactive Robot Perception & Learning, TU Darmstadt 3Intelligent Autonomous Systems, TU Darmstadt 4Hessian.AI 5German Research Center for AI 6Centre for Cognitivie Science, TU Darmstadt . Correspondence to: Onur Celik <EMAIL>.
Pseudocode Yes Algorithm 1 DIME: Diffusion-Based Maximum Entropy Reinforcement Learning
Open Source Code Yes 1https://alrhub.github.io/dime-website/
Open Datasets Yes We consider a broad range of 13 sophisticated learning environments from different benchmark suits, ranging from mujoco gym (Brockman et al., 2016), deepmind control suit (DMC) (Tunyasuvunakool et al., 2020), and myo suite (Caggiano et al., 2022)
Dataset Splits No The paper uses standard benchmark suites (mujoco gym, deepmind control suite, myo suite) which have defined environments for training and evaluation. However, it does not explicitly provide information about how data collected from these environments is split into training, validation, or test sets in the traditional supervised learning sense. It mentions running experiments with "10 seeds" but no specific dataset split percentages or methodologies.
Hardware Specification Yes The number of diffusion steps might affect the performance and the computation time. (a) shows DIME s learning curves for varying diffusion steps. Two diffusion steps perform badly, whereas four and eight diffusion steps perform similar but still worse than 16 and 32 diffusion steps which perform similarly. (b) shows the computation time for 1MIO steps of the corresponding learning curves. The smaller the diffusion steps, the less computation time is required. Learning Curves on Gym Benchmark Suite (c)-(d). We compare DIME against various diffusion baselines and Cross Q on the (c) Ant-v3 and (d) Humanoid-v3 from the Gym suite. While all diffusion-based methods are outperformed by DIME, DIME performs on par with Cross Q on the Ant environment. DIME performs favorably on the high-dimensional Humanoid-v3 environment, where it also outperforms Cross Q.
Software Dependencies No The paper mentions using specific algorithms and components (e.g., "Cross Q algorithm", "Adam optimizer", "Batch Renormalization") but does not provide specific version numbers for any software libraries or dependencies used (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes Table 2. Hyperparameters of DIME and all diffusion-based algorithms for all benchmark suits. Varying hyperparameters for different benchmark suits are described in the text. Table 3. Hyperparameters of DIME and Gaussian-based algorithms for all benchmark suits. Varying hyperparameters for different benchmark suits are described in the text.