Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning

Authors: Chen-Xiao Gao, Chenyang Wu, Mingjun Cao, Chenjun Xiao, Yang Yu, Zongzhang Zhang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive evaluations conducted on synthetic 2D tasks and continuous control tasks from the D4RL benchmark validate its effectiveness and superior performance. (Abstract) and Table 1. Comparison of BDPO and various baseline methods on locomotion-v2 and antmaze-v0 datasets from D4RL.
Researcher Affiliation Academia 1National Key Laboratory for Novel Software Technology, Nanjing University, China & School of Artificial Intelligence, Nanjing University, China 2The Chinese University of Hong Kong, Shenzhen, China. Correspondence to: Zongzhang Zhang <EMAIL>.
Pseudocode Yes Algorithm 1 Behavior-Regularized Diffuion Policy Optimization (BDPO)
Open Source Code Yes The code and experiment results of BDPO are available on the project webpage.
Open Datasets Yes Comprehensive evaluations conducted on synthetic 2D tasks and continuous control tasks from the D4RL benchmark validate its effectiveness and superior performance. (Abstract) and For the offline dataset, we choose the -v2 datasets with three levels of qualities provided by D4RL (Fu et al., 2020) (Section A).
Dataset Splits No The paper uses offline datasets from D4RL (Fu et al., 2020), describing their composition (e.g., 'medium', 'medium-replay', 'medium-expert') but does not specify how these datasets were further split into training, testing, or validation sets for their experiments. Evaluation refers to running the learned policy in an environment.
Hardware Specification Yes We evaluate BDPO, DAC, and Diffusion-QL with workstations equipped with NVIDIA RTX 4090 cards and the walker2dmedium-replay-v2 dataset.
Software Dependencies No The paper mentions software like 'PyTorch' and 'JAX' (Figure 12) for implementation and 'ADAM' (Table 3) as an optimizer, but it does not specify any version numbers for these or other software dependencies.
Experiment Setup Yes Table 3. Common hyperparameters across all datasets. and Table 4. Hyper-parameters that vary in different tasks. (Section B.1) explicitly list numerous hyperparameters and training settings.