Bootstrapped Model Predictive Control

Authors: Yuhang Wang, Hanwei Guo, Sizhe Wang, Long Qian, Xuguang Lan

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments, we show that learning a network policy through expert imitation can better leverage the strengths of MPC than learning a policy in a model-free manner, thus leading to better value estimation and MPC performance. Our method, BMPC, achieves superior sample efficiency over prior data-efficient RL methods across 42 continuous control tasks in DMControl (Tassa et al., 2018) and Humanoid Bench (Sferrazza et al., 2024), with comparable training time and smaller network sizes. In particular, in challenging high-dimensional locomotion tasks, BMPC significantly improves data efficiency while also enhancing asymptotic performance and training stability.
Researcher Affiliation Academia Yuhang Wang, Hanwei Guo, Sizhe Wang, Long Qian, Xuguang Lan National Key Laboratory of Human-Machine Hybrid Augmented Intelligence National Engineering Research Center for Visual Information and Application Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University, Xi an, China EMAIL EMAIL
Pseudocode Yes Algorithm 1 BMPC training
Open Source Code Yes Code is available at https://github.com/wertyuilife2/bmpc.
Open Datasets Yes Our method, BMPC, achieves superior sample efficiency over prior data-efficient RL methods across 42 continuous control tasks in DMControl (Tassa et al., 2018) and Humanoid Bench (Sferrazza et al., 2024), with comparable training time and smaller network sizes.
Dataset Splits No The paper uses benchmark environments/tasks (DMControl and Humanoid Bench) for evaluation, and discusses 'environment steps' for training. However, it does not provide explicit training/validation/test splits of observational data within these environments, nor does it specify exact percentages or sample counts for such splits. The splitting methodology is not detailed beyond the use of tasks for training and evaluation.
Hardware Specification Yes The experiments are conducted using a single RTX 3090 GPU.
Software Dependencies No The paper refers to using the latest code and default hyperparameters for baselines like TD-MPC2 and Dreamer V3, and notes that BMPC is based on TD-MPC2's world model and MPPI. However, it does not explicitly state the specific versions of programming languages, libraries, or other software dependencies used for its own implementation (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes We use the same hyperparameters for BMPC across all tasks, see Table 2, and detailed baseline configurations are provided in Appendix B. Table 2: BMPC Hyperparameters. We use the same hyperparameters for all tasks.