Reward-free World Models for Online Imitation Learning

Authors: Shangzhe Li, Zhiao Huang, Hao Su

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on a diverse set of benchmarks, including DMControl, Myo Suite, and Mani Skill2, demonstrating superior empirical performance compared to existing approaches.
Researcher Affiliation Collaboration 1South China University of Technology 2Hillbot Inc. 3University of California, San Diego. Correspondence to: Shangzhe Li <EMAIL>.
Pseudocode Yes Algorithm 1 IQ-MPC (inference) and Algorithm 2 IQ-MPC (training) are presented in the paper.
Open Source Code No The paper does not provide an explicit statement of code release or a link to a code repository for the methodology described.
Open Datasets Yes We evaluate our method on a diverse set of benchmarks, including DMControl (Tunyasuvunakool et al., 2020), Myo Suite (Caggiano et al., 2022), and Mani Skill2 (Gu et al., 2023), demonstrating superior empirical performance compared to existing approaches.
Dataset Splits Yes We use 100 expert trajectories for low-dimensional tasks (Hopper, Walker, Quadruped, Cheetah), 500 for Humanoid, and 1000 for Dog (both high-dimensional). Each trajectory contains 500 steps, sampled using trained TDMPC2 world models (Hansen et al., 2023). We leverage 100 expert trajectories with 100 steps sampled from trained TD-MPC2 for each task.
Hardware Specification Yes All baselines are trained using a single RTX 2080 Ti GPU.
Software Dependencies No The paper uses Pytorch-like notation for describing the architecture but does not specify version numbers for key software components like PyTorch itself or other libraries used for implementation.
Experiment Setup Yes The detailed hyperparameters used in IQ-MPC are as follows: The batch size during training is 256. We leverage λ = 0.5 in a horizon. ... We set the learning rate of the model to 3e 4. The entropy coefficient β = 1e 4. ... We use soft update coefficient τ = 0.01.