Diffusion Model Predictive Control
Authors: Guangyao Zhou, Sivaramakrishnan Swaminathan, Rajkumar Vasudeva Raju, J Swaroop Guntupalli, Wolfgang Lehrach, Joseph Ortiz, Antoine Dedieu, Miguel Lazaro-Gredilla, Kevin Patrick Murphy
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show performance that is significantly better than existing model-based offline planning methods using MPC (e.g. MBOP(Argenson & Dulac-Arnold, 2021)) and competitive with state-of-the-art (SOTA) model-based and model-free reinforcement learning methods. We additionally illustrate D-MPC s ability to optimize novel reward functions at run time and adapt to novel dynamics, and highlight its advantages compared to existing diffusion-based planning baselines. In this section, we conduct various experiments to evaluate the effectiveness of D-MPC. Specifically we seek to answer the following questions with our experiments: 1. Does our proposed D-MPC improve performance over existing MPC approaches (which learn the model offline), and can it perform competitively with standard model-based and model-free offline RL methods? 2. Can D-MPC optimize novel rewards and quickly adapt to new environment dynamics at run time? 3. How do the different components of D-MPC contribute to its improved performance? 4. Can we distill D-MPC into a fast reactive policy for high-frequency control? |
| Researcher Affiliation | Industry | Guangyao Zhou EMAIL Google Deep Mind Sivaramakrishnan Swaminathan EMAIL Google Deep Mind Rajkumar Vasudeva Raju EMAIL Google Deep Mind J. Swaroop Guntupalli EMAIL Google Deep Mind Wolfgang Lehrach EMAIL Google Deep Mind Joseph Ortiz EMAIL Google Deep Mind Antoine Dedieu EMAIL Google Deep Mind Miguel Lázaro-Gredilla EMAIL Google Deep Mind Kevin Murphy EMAIL Google Deep Mind |
| Pseudocode | Yes | The overall MPC pseudocode is provided in Algorithm 1. Algorithm 1: Main MPC loop. Algorithm 2: Sampling-based planner with learned multi-step diffusion action proposals |
| Open Source Code | No | The paper does not contain an explicit statement about releasing code or a link to a code repository. It only describes the methodology and implementation details. |
| Open Datasets | Yes | We evaluate the performance of our proposed D-MPC on various D4RL (Fu et al., 2020) tasks. |
| Dataset Splits | No | The paper mentions training on 'medium datasets from the respective domains' and evaluating using 'state/action sequences sampled from the medium (training data), medium-replay (lower quality data) and expert (higher quality data) datasets'. However, it does not provide specific ratios, percentages, or absolute counts for training, validation, and test splits within a single dataset. It uses different quality datasets for different purposes rather than defining explicit splits of a single dataset for model training and evaluation. |
| Hardware Specification | Yes | We train and evaluate all models on A100 GPUs. We use a single A100 GPU for each training run, and separate worker with a single A100 GPU for evaluation. |
| Software Dependencies | No | The paper mentions using DDIM with a cosine schedule for diffusion implementation and the Adam optimizer for training. However, it does not provide specific version numbers for these or any other software libraries or programming languages (e.g., Python 3.x, PyTorch 1.x, CUDA 11.x). |
| Experiment Setup | Yes | For all of our model training, we use the Adam optimizer for which the learning rate warms up from 0 to 10 4 over 500 steps and then follows a cosine decay schedule from 10 4 to 10 5. We train all models for 2 106 steps. We use gradient clipping at norm 5, and uses EMA with a decay factor of 0.99. All of our evaluations are done using the EMA parameters for the models. For all of our experiments, we use a forecast horizon F = 32, number of samples N = 64, and a history length H = 1. |