reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bootstrapped Model Predictive Control

Authors: Yuhang Wang, Hanwei Guo, Sizhe Wang, Long Qian, Xuguang Lan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments, we show that learning a network policy through expert imitation can better leverage the strengths of MPC than learning a policy in a model-free manner, thus leading to better value estimation and MPC performance. Our method, BMPC, achieves superior sample efficiency over prior data-efficient RL methods across 42 continuous control tasks in DMControl (Tassa et al., 2018) and Humanoid Bench (Sferrazza et al., 2024), with comparable training time and smaller network sizes. In particular, in challenging high-dimensional locomotion tasks, BMPC significantly improves data efficiency while also enhancing asymptotic performance and training stability.
Researcher Affiliation	Academia	Yuhang Wang, Hanwei Guo, Sizhe Wang, Long Qian, Xuguang Lan National Key Laboratory of Human-Machine Hybrid Augmented Intelligence National Engineering Research Center for Visual Information and Application Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University, Xi an, China EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 BMPC training
Open Source Code	Yes	Code is available at https://github.com/wertyuilife2/bmpc.
Open Datasets	Yes	Our method, BMPC, achieves superior sample efficiency over prior data-efficient RL methods across 42 continuous control tasks in DMControl (Tassa et al., 2018) and Humanoid Bench (Sferrazza et al., 2024), with comparable training time and smaller network sizes.
Dataset Splits	No	The paper uses benchmark environments/tasks (DMControl and Humanoid Bench) for evaluation, and discusses 'environment steps' for training. However, it does not provide explicit training/validation/test splits of observational data within these environments, nor does it specify exact percentages or sample counts for such splits. The splitting methodology is not detailed beyond the use of tasks for training and evaluation.
Hardware Specification	Yes	The experiments are conducted using a single RTX 3090 GPU.
Software Dependencies	No	The paper refers to using the latest code and default hyperparameters for baselines like TD-MPC2 and Dreamer V3, and notes that BMPC is based on TD-MPC2's world model and MPPI. However, it does not explicitly state the specific versions of programming languages, libraries, or other software dependencies used for its own implementation (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	We use the same hyperparameters for BMPC across all tasks, see Table 2, and detailed baseline configurations are provided in Appendix B. Table 2: BMPC Hyperparameters. We use the same hyperparameters for all tasks.