reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Reward-free World Models for Online Imitation Learning

Authors: Shangzhe Li, Zhiao Huang, Hao Su

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on a diverse set of benchmarks, including DMControl, Myo Suite, and Mani Skill2, demonstrating superior empirical performance compared to existing approaches.
Researcher Affiliation	Collaboration	1South China University of Technology 2Hillbot Inc. 3University of California, San Diego. Correspondence to: Shangzhe Li <EMAIL>.
Pseudocode	Yes	Algorithm 1 IQ-MPC (inference) and Algorithm 2 IQ-MPC (training) are presented in the paper.
Open Source Code	No	The paper does not provide an explicit statement of code release or a link to a code repository for the methodology described.
Open Datasets	Yes	We evaluate our method on a diverse set of benchmarks, including DMControl (Tunyasuvunakool et al., 2020), Myo Suite (Caggiano et al., 2022), and Mani Skill2 (Gu et al., 2023), demonstrating superior empirical performance compared to existing approaches.
Dataset Splits	Yes	We use 100 expert trajectories for low-dimensional tasks (Hopper, Walker, Quadruped, Cheetah), 500 for Humanoid, and 1000 for Dog (both high-dimensional). Each trajectory contains 500 steps, sampled using trained TDMPC2 world models (Hansen et al., 2023). We leverage 100 expert trajectories with 100 steps sampled from trained TD-MPC2 for each task.
Hardware Specification	Yes	All baselines are trained using a single RTX 2080 Ti GPU.
Software Dependencies	No	The paper uses Pytorch-like notation for describing the architecture but does not specify version numbers for key software components like PyTorch itself or other libraries used for implementation.
Experiment Setup	Yes	The detailed hyperparameters used in IQ-MPC are as follows: The batch size during training is 256. We leverage λ = 0.5 in a horizon. ... We set the learning rate of the model to 3e 4. The entropy coefficient β = 1e 4. ... We use soft update coefficient τ = 0.01.