M^3PC: Test-time Model Predictive Control using Pretrained Masked Trajectory Model

Authors: Kehan Wen, Yutong Hu, Yao Mu, Lei Ke

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results on D4RL and Robo Mimic show that our inference-phase MPC significantly improves the decision-making performance of a pretrained trajectory model without any additional parameter training.
Researcher Affiliation Academia 1ETH Zurich, 2KU Leuven, 3Hong Kong University, 4Carnegie Mellon University
Pseudocode Yes Algorithm 1 Forward M3PC for Reward Maximization
Open Source Code Yes Code is available: https://github.com/wkh923/m3pc.
Open Datasets Yes To answer these questions, we utilize D4RL and Robo Mimic dataset suites.
Dataset Splits No No explicit train/test/validation dataset splits with percentages or counts are provided for the D4RL and Robo Mimic datasets in the main text. The paper refers to using D4RL and Robo Mimic dataset suites, which are established benchmarks, but does not detail how these were split for their specific experiments (e.g., "80/10/10 split").
Hardware Specification Yes The entire training process, including both pretraining and finetuning, is performed on NVIDIA 3090 GPUs.
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup Yes Table 4 in Section B 'HYPERPARAMETERS' explicitly lists numerous hyperparameters for both offline and online training, including batch size, learning rate, weight decay, target entropy, scheduler type, warmup steps, training steps, and architecture-specific parameters like number of encoder/decoder layers, heads, and embedding dimension.