reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Maximum Entropy Reinforcement Learning with Diffusion Policy

Authors: Xiaoyi Dong, Jian Cheng, Xi Sheryl Zhang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on Mujoco benchmarks show that Max Ent DP outperforms the Gaussian policy and other generative models within the Max Ent RL framework, and performs comparably to other state-of-the-art diffusion-based online RL algorithms. Our code is available at https://github.com/diffusionyes/Max Ent DP. ... In this section, we conduct experiments to address the following questions: (1) Can Max Ent DP effectively learn a multi-modal policy in a multi-goal task? (2) Does the diffusion policy outperform the Gaussian policy and other generative models within the Max Ent RL framework? (3) How does performance vary when replacing the Q-weighted Noise Estimation method with competing approaches, such as QSM and i DEM? (4) How does Max Ent DP compare to other diffusion-based online RL algorithms? (5) Does the Max Ent RL objective benefit policy training?
Researcher Affiliation	Academia	Xiaoyi Dong 1 2 Jian Cheng 1 3 4 Xi Sheryl Zhang 1 4 1C2DL, Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3School of Future Technology, University of Chinese Academy of Sciences 4Ai Ri A. Correspondence to: Xi Sheryl Zhang <EMAIL>.
Pseudocode	Yes	The pseudocode for our method is presented in Algorithm 1. Algorithm 1 Max Ent RL with Diffusion Policy
Open Source Code	Yes	Our code is available at https://github.com/diffusionyes/Max Ent DP.
Open Datasets	Yes	Experimental results on Mujoco benchmarks show that Max Ent DP outperforms the Gaussian policy and other generative models within the Max Ent RL framework... We test Max Ent DP on 3 high-dimensional tasks on the Deep Mind Control Suite benchmarks.
Dataset Splits	No	The paper uses "Mujoco benchmarks" and "Deep Mind Control Suite benchmarks" which are simulation environments for Reinforcement Learning. The paper describes experimental setup by mentioning "environment interactions" but does not specify any fixed training/test/validation dataset splits, percentages, or explicit splitting methodology for static datasets, which are not typically applicable in online Reinforcement Learning environments.
Hardware Specification	Yes	All experiments in this paper are conducted on a GPU of Nvidia Ge Force RTX 3090 and a CPU of AMD EPYC 7742.
Software Dependencies	No	The paper mentions "Leveraging the computation efficiency of JAX (Frostig et al., 2018)" but does not specify a version number for JAX or any other key software libraries used in the implementation, nor for the official codes of baseline algorithms.
Experiment Setup	Yes	The shared hyperparameters of all algorithms are listed in Table 1. Table 1. The shared hyperparameters of all algorithms. Hyperparameter Max Ent DP SAC MEow TD3 QSM DACER QVPO DIPO Batch size 256 ... Diffusion steps 20 ... Actor learning rate 3e-4 ...