Habitizing Diffusion Planning for Efficient and Effective Decision Making
Authors: Haofei Lu, Yifei Shen, Dongsheng Li, Junliang Xing, Dongqi Han
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further conduct comprehensive evaluations across various tasks, offering empirical insights into efficient and effective decision making. ... We empirically evaluate Habi on a diverse set of tasks from the D4RL dataset (Fu et al., 2020), one of the most widely used benchmarks for offline RL. ... All the results are calculated over 500 episode seeds for each task to provide a reliable evaluation. HI s results are additionally averaged on 5 training seeds to ensure robustness. |
| Researcher Affiliation | Collaboration | The work was conducted during the internship of Haofei Lu (EMAIL) at Microsoft Research Asia 1Department of Computer Science and Technology, Tsinghua University 2Microsoft Research Asia. Correspondence to: Junliang Xing <EMAIL>, Dongqi Han <EMAIL>. |
| Pseudocode | No | The paper describes methods and processes but does not include any clearly labeled pseudocode or algorithm blocks. Figure 3 is a diagram, not pseudocode. |
| Open Source Code | Yes | Our code is anonymously available at https://bayesbrain.github.io/. |
| Open Datasets | Yes | We empirically evaluate Habi on a diverse set of tasks from the D4RL dataset (Fu et al., 2020), one of the most widely used benchmarks for offline RL. |
| Dataset Splits | Yes | We empirically evaluate Habi on a diverse set of tasks from the D4RL dataset (Fu et al., 2020), one of the most widely used benchmarks for offline RL. ... All the results are calculated over 500 episode seeds for each task to provide a reliable evaluation. HI s results are additionally averaged on 5 training seeds to ensure robustness. |
| Hardware Specification | Yes | All runtime measurements were conducted on two different computing hardwares: a laptop CPU (Apple M2 Max) or a server GPU (Nvidia A100). Training was on Nvidia A100 GPUs. |
| Software Dependencies | No | The paper mentions 'Optimizer Adam' in Table 3 and 'Clean Diffuser (Dong et al., 2024b)' was used to reproduce baselines, but it does not provide specific version numbers for any software libraries or dependencies used for the main methodology. |
| Experiment Setup | Yes | Table 3. Hyperparameters in our experiments. Settings Value Optimizer Adam Learning Rate 3e-4 Gradient Steps 1000000 Batch Size 256 Latent Dimension: Dim(z) 256 MLP Hidden Size (Encoder & Decoder) 256 MLP Hidden Layers (Encoder & Decoder) 2 Habitization Target (Locomotion Related) DQL (Wang et al., 2023) (Mu Jo Co, Antmaze) Habitization Target (Planning Related) DV (Lu et al., 2025) (Kitchen, Maze2D) Target KL-divergence Dtar KL 1.0 Number of Sampling Candidates in Habitization training 50 Number of Sampling Candidates in Habitual Inference 5 |