Habitizing Diffusion Planning for Efficient and Effective Decision Making

Authors: Haofei Lu, Yifei Shen, Dongsheng Li, Junliang Xing, Dongqi Han

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further conduct comprehensive evaluations across various tasks, offering empirical insights into efficient and effective decision making. ... We empirically evaluate Habi on a diverse set of tasks from the D4RL dataset (Fu et al., 2020), one of the most widely used benchmarks for offline RL. ... All the results are calculated over 500 episode seeds for each task to provide a reliable evaluation. HI s results are additionally averaged on 5 training seeds to ensure robustness.
Researcher Affiliation Collaboration The work was conducted during the internship of Haofei Lu (EMAIL) at Microsoft Research Asia 1Department of Computer Science and Technology, Tsinghua University 2Microsoft Research Asia. Correspondence to: Junliang Xing <EMAIL>, Dongqi Han <EMAIL>.
Pseudocode No The paper describes methods and processes but does not include any clearly labeled pseudocode or algorithm blocks. Figure 3 is a diagram, not pseudocode.
Open Source Code Yes Our code is anonymously available at https://bayesbrain.github.io/.
Open Datasets Yes We empirically evaluate Habi on a diverse set of tasks from the D4RL dataset (Fu et al., 2020), one of the most widely used benchmarks for offline RL.
Dataset Splits Yes We empirically evaluate Habi on a diverse set of tasks from the D4RL dataset (Fu et al., 2020), one of the most widely used benchmarks for offline RL. ... All the results are calculated over 500 episode seeds for each task to provide a reliable evaluation. HI s results are additionally averaged on 5 training seeds to ensure robustness.
Hardware Specification Yes All runtime measurements were conducted on two different computing hardwares: a laptop CPU (Apple M2 Max) or a server GPU (Nvidia A100). Training was on Nvidia A100 GPUs.
Software Dependencies No The paper mentions 'Optimizer Adam' in Table 3 and 'Clean Diffuser (Dong et al., 2024b)' was used to reproduce baselines, but it does not provide specific version numbers for any software libraries or dependencies used for the main methodology.
Experiment Setup Yes Table 3. Hyperparameters in our experiments. Settings Value Optimizer Adam Learning Rate 3e-4 Gradient Steps 1000000 Batch Size 256 Latent Dimension: Dim(z) 256 MLP Hidden Size (Encoder & Decoder) 256 MLP Hidden Layers (Encoder & Decoder) 2 Habitization Target (Locomotion Related) DQL (Wang et al., 2023) (Mu Jo Co, Antmaze) Habitization Target (Planning Related) DV (Lu et al., 2025) (Kitchen, Maze2D) Target KL-divergence Dtar KL 1.0 Number of Sampling Candidates in Habitization training 50 Number of Sampling Candidates in Habitual Inference 5