Discrete Codebook World Models for Continuous Control

Authors: Aidan Scannell, Mohammadreza Nakhaeinezhadfard, Kalle Kujanpää, Yi Zhao, Kevin Luck, Arno Solin, Joni Pajarinen

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we experimentally evaluate DC-MPC in a variety of continuous control tasks from the Deep Mind Control Suite (DMControl) (Tassa et al., 2018), Meta-World (Yu et al., 2019) and Myo Suite (Vittorio et al., 2022) against a number of baselines and ablations. Our experiments seek to answer the following research questions: RQ1 Does DC-MPC s discrete latent space offer benefits over a continuous latent space? RQ2 What is important for learning a latent space: (i) classification loss, (ii) discrete codebook, (iii) stochastic dynamics or (iv) multimodal dynamics? RQ3 Does DC-MPC s codebook offer benefits for dynamics/value/policy learning over alternative discrete encodings such as (i) one-hot encoding (similar to Dreamer V2) and (ii) label encoding? RQ4 How does DC-MPC compare to state-of-the-art model-based RL algorithms leveraging latent state embeddings, especially in the hard DMControl and Meta-World tasks?
Researcher Affiliation Academia Aidan Scannell University of Edinburgh EMAIL Mohammadreza Nakhaei Aalto University Kalle Kujanpää Aalto University Yi Zhao Aalto University Kevin Sebastian Luck Vrije Universiteit Amsterdam Arno Solin Aalto University Joni Pajarinen Aalto University
Pseudocode Yes See Fig. 1 for an overview of DCWM, Alg. 1 for details of world model training and Alg. 2 for details on the MPPI planning procedure.
Open Source Code Yes For full details of the implementation, model architectures, and training, please check the code, which is available in the submitted supplementary material and available on github at https://github.com/aidanscannell/dcmpc.
Open Datasets Yes In this section, we experimentally evaluate DC-MPC in a variety of continuous control tasks from the Deep Mind Control Suite (DMControl) (Tassa et al., 2018), Meta-World (Yu et al., 2019) and Myo Suite (Vittorio et al., 2022) against a number of baselines and ablations.
Dataset Splits No The paper uses continuous control tasks from the Deep Mind Control Suite, Meta-World, and Myo Suite. These are interactive environments where agents collect data, rather than static datasets with predefined train/test/validation splits. The paper mentions
Hardware Specification Yes We used NVIDIA A100s and AMD Instinct MI250X GPUs to run our experiments. All our experiments have been run on a single GPU with a single-digit number of CPU workers.
Software Dependencies No We implemented DC-MPC with Py Torch (Paszke et al., 2019) and used the Adam W optimizer (Kingma & Ba, 2015) for training the models. While PyTorch and Adam W are mentioned with citations, specific version numbers for the software components are not explicitly provided.
Experiment Setup Yes Table 1 lists all of the hyperparameters for training DC-MPC which were used for the main experiments and the ablations.