Reward-free World Models for Online Imitation Learning
Authors: Shangzhe Li, Zhiao Huang, Hao Su
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on a diverse set of benchmarks, including DMControl, Myo Suite, and Mani Skill2, demonstrating superior empirical performance compared to existing approaches. |
| Researcher Affiliation | Collaboration | 1South China University of Technology 2Hillbot Inc. 3University of California, San Diego. Correspondence to: Shangzhe Li <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 IQ-MPC (inference) and Algorithm 2 IQ-MPC (training) are presented in the paper. |
| Open Source Code | No | The paper does not provide an explicit statement of code release or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We evaluate our method on a diverse set of benchmarks, including DMControl (Tunyasuvunakool et al., 2020), Myo Suite (Caggiano et al., 2022), and Mani Skill2 (Gu et al., 2023), demonstrating superior empirical performance compared to existing approaches. |
| Dataset Splits | Yes | We use 100 expert trajectories for low-dimensional tasks (Hopper, Walker, Quadruped, Cheetah), 500 for Humanoid, and 1000 for Dog (both high-dimensional). Each trajectory contains 500 steps, sampled using trained TDMPC2 world models (Hansen et al., 2023). We leverage 100 expert trajectories with 100 steps sampled from trained TD-MPC2 for each task. |
| Hardware Specification | Yes | All baselines are trained using a single RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper uses Pytorch-like notation for describing the architecture but does not specify version numbers for key software components like PyTorch itself or other libraries used for implementation. |
| Experiment Setup | Yes | The detailed hyperparameters used in IQ-MPC are as follows: The batch size during training is 256. We leverage λ = 0.5 in a horizon. ... We set the learning rate of the model to 3e 4. The entropy coefficient β = 1e 4. ... We use soft update coefficient τ = 0.01. |