reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning

Authors: Chen-Xiao Gao, Chenyang Wu, Mingjun Cao, Chenjun Xiao, Yang Yu, Zongzhang Zhang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive evaluations conducted on synthetic 2D tasks and continuous control tasks from the D4RL benchmark validate its effectiveness and superior performance. (Abstract) and Table 1. Comparison of BDPO and various baseline methods on locomotion-v2 and antmaze-v0 datasets from D4RL.
Researcher Affiliation	Academia	1National Key Laboratory for Novel Software Technology, Nanjing University, China & School of Artificial Intelligence, Nanjing University, China 2The Chinese University of Hong Kong, Shenzhen, China. Correspondence to: Zongzhang Zhang <EMAIL>.
Pseudocode	Yes	Algorithm 1 Behavior-Regularized Diffuion Policy Optimization (BDPO)
Open Source Code	Yes	The code and experiment results of BDPO are available on the project webpage.
Open Datasets	Yes	Comprehensive evaluations conducted on synthetic 2D tasks and continuous control tasks from the D4RL benchmark validate its effectiveness and superior performance. (Abstract) and For the offline dataset, we choose the -v2 datasets with three levels of qualities provided by D4RL (Fu et al., 2020) (Section A).
Dataset Splits	No	The paper uses offline datasets from D4RL (Fu et al., 2020), describing their composition (e.g., 'medium', 'medium-replay', 'medium-expert') but does not specify how these datasets were further split into training, testing, or validation sets for their experiments. Evaluation refers to running the learned policy in an environment.
Hardware Specification	Yes	We evaluate BDPO, DAC, and Diffusion-QL with workstations equipped with NVIDIA RTX 4090 cards and the walker2dmedium-replay-v2 dataset.
Software Dependencies	No	The paper mentions software like 'PyTorch' and 'JAX' (Figure 12) for implementation and 'ADAM' (Table 3) as an optimizer, but it does not specify any version numbers for these or other software dependencies.
Experiment Setup	Yes	Table 3. Common hyperparameters across all datasets. and Table 4. Hyper-parameters that vary in different tasks. (Section B.1) explicitly list numerous hyperparameters and training settings.