M^3PC: Test-time Model Predictive Control using Pretrained Masked Trajectory Model
Authors: Kehan Wen, Yutong Hu, Yao Mu, Lei Ke
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on D4RL and Robo Mimic show that our inference-phase MPC significantly improves the decision-making performance of a pretrained trajectory model without any additional parameter training. |
| Researcher Affiliation | Academia | 1ETH Zurich, 2KU Leuven, 3Hong Kong University, 4Carnegie Mellon University |
| Pseudocode | Yes | Algorithm 1 Forward M3PC for Reward Maximization |
| Open Source Code | Yes | Code is available: https://github.com/wkh923/m3pc. |
| Open Datasets | Yes | To answer these questions, we utilize D4RL and Robo Mimic dataset suites. |
| Dataset Splits | No | No explicit train/test/validation dataset splits with percentages or counts are provided for the D4RL and Robo Mimic datasets in the main text. The paper refers to using D4RL and Robo Mimic dataset suites, which are established benchmarks, but does not detail how these were split for their specific experiments (e.g., "80/10/10 split"). |
| Hardware Specification | Yes | The entire training process, including both pretraining and finetuning, is performed on NVIDIA 3090 GPUs. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Table 4 in Section B 'HYPERPARAMETERS' explicitly lists numerous hyperparameters for both offline and online training, including batch size, learning rate, weight decay, target entropy, scheduler type, warmup steps, training steps, and architecture-specific parameters like number of encoder/decoder layers, heads, and embedding dimension. |