reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Discrete Codebook World Models for Continuous Control

Authors: Aidan Scannell, Mohammadreza Nakhaeinezhadfard, Kalle Kujanpää, Yi Zhao, Kevin Luck, Arno Solin, Joni Pajarinen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we experimentally evaluate DC-MPC in a variety of continuous control tasks from the Deep Mind Control Suite (DMControl) (Tassa et al., 2018), Meta-World (Yu et al., 2019) and Myo Suite (Vittorio et al., 2022) against a number of baselines and ablations. Our experiments seek to answer the following research questions: RQ1 Does DC-MPC s discrete latent space offer beneﬁts over a continuous latent space? RQ2 What is important for learning a latent space: (i) classiﬁcation loss, (ii) discrete codebook, (iii) stochastic dynamics or (iv) multimodal dynamics? RQ3 Does DC-MPC s codebook offer beneﬁts for dynamics/value/policy learning over alternative discrete encodings such as (i) one-hot encoding (similar to Dreamer V2) and (ii) label encoding? RQ4 How does DC-MPC compare to state-of-the-art model-based RL algorithms leveraging latent state embeddings, especially in the hard DMControl and Meta-World tasks?
Researcher Affiliation	Academia	Aidan Scannell University of Edinburgh EMAIL Mohammadreza Nakhaei Aalto University Kalle Kujanpää Aalto University Yi Zhao Aalto University Kevin Sebastian Luck Vrije Universiteit Amsterdam Arno Solin Aalto University Joni Pajarinen Aalto University
Pseudocode	Yes	See Fig. 1 for an overview of DCWM, Alg. 1 for details of world model training and Alg. 2 for details on the MPPI planning procedure.
Open Source Code	Yes	For full details of the implementation, model architectures, and training, please check the code, which is available in the submitted supplementary material and available on github at https://github.com/aidanscannell/dcmpc.
Open Datasets	Yes	In this section, we experimentally evaluate DC-MPC in a variety of continuous control tasks from the Deep Mind Control Suite (DMControl) (Tassa et al., 2018), Meta-World (Yu et al., 2019) and Myo Suite (Vittorio et al., 2022) against a number of baselines and ablations.
Dataset Splits	No	The paper uses continuous control tasks from the Deep Mind Control Suite, Meta-World, and Myo Suite. These are interactive environments where agents collect data, rather than static datasets with predefined train/test/validation splits. The paper mentions
Hardware Specification	Yes	We used NVIDIA A100s and AMD Instinct MI250X GPUs to run our experiments. All our experiments have been run on a single GPU with a single-digit number of CPU workers.
Software Dependencies	No	We implemented DC-MPC with Py Torch (Paszke et al., 2019) and used the Adam W optimizer (Kingma & Ba, 2015) for training the models. While PyTorch and Adam W are mentioned with citations, specific version numbers for the software components are not explicitly provided.
Experiment Setup	Yes	Table 1 lists all of the hyperparameters for training DC-MPC which were used for the main experiments and the ablations.