DIME: Diffusion-Based Maximum Entropy Reinforcement Learning
Authors: Onur Celik, Zechu Li, Denis Blessing, Ge Li, Daniel Palenicek, Jan Peters, Georgia Chalvatzaki, Gerhard Neumann
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On 13 challenging continuous high-dimensional control benchmarks, we empirically validate that DIME significantly outperforms other diffusion-based methods on all environments and consistently outperforms other state-of-the-art non-diffusion based RL methods while requiring fewer algorithmic design choices and smaller update-to-data ratios, reducing computational complexity1. |
| Researcher Affiliation | Academia | 1Autonomous Learning Robots, KIT 2Interactive Robot Perception & Learning, TU Darmstadt 3Intelligent Autonomous Systems, TU Darmstadt 4Hessian.AI 5German Research Center for AI 6Centre for Cognitivie Science, TU Darmstadt . Correspondence to: Onur Celik <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 DIME: Diffusion-Based Maximum Entropy Reinforcement Learning |
| Open Source Code | Yes | 1https://alrhub.github.io/dime-website/ |
| Open Datasets | Yes | We consider a broad range of 13 sophisticated learning environments from different benchmark suits, ranging from mujoco gym (Brockman et al., 2016), deepmind control suit (DMC) (Tunyasuvunakool et al., 2020), and myo suite (Caggiano et al., 2022) |
| Dataset Splits | No | The paper uses standard benchmark suites (mujoco gym, deepmind control suite, myo suite) which have defined environments for training and evaluation. However, it does not explicitly provide information about how data collected from these environments is split into training, validation, or test sets in the traditional supervised learning sense. It mentions running experiments with "10 seeds" but no specific dataset split percentages or methodologies. |
| Hardware Specification | Yes | The number of diffusion steps might affect the performance and the computation time. (a) shows DIME s learning curves for varying diffusion steps. Two diffusion steps perform badly, whereas four and eight diffusion steps perform similar but still worse than 16 and 32 diffusion steps which perform similarly. (b) shows the computation time for 1MIO steps of the corresponding learning curves. The smaller the diffusion steps, the less computation time is required. Learning Curves on Gym Benchmark Suite (c)-(d). We compare DIME against various diffusion baselines and Cross Q on the (c) Ant-v3 and (d) Humanoid-v3 from the Gym suite. While all diffusion-based methods are outperformed by DIME, DIME performs on par with Cross Q on the Ant environment. DIME performs favorably on the high-dimensional Humanoid-v3 environment, where it also outperforms Cross Q. |
| Software Dependencies | No | The paper mentions using specific algorithms and components (e.g., "Cross Q algorithm", "Adam optimizer", "Batch Renormalization") but does not provide specific version numbers for any software libraries or dependencies used (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | Table 2. Hyperparameters of DIME and all diffusion-based algorithms for all benchmark suits. Varying hyperparameters for different benchmark suits are described in the text. Table 3. Hyperparameters of DIME and Gaussian-based algorithms for all benchmark suits. Varying hyperparameters for different benchmark suits are described in the text. |