reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient

Authors: Wenlong Wang, Ivana Dusparic, Yucheng Shi, Ke Zhang, Vinny Cahill

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Drama on the Atari100k benchmark, demonstrating that it achieves performance comparable to other SOTA algorithms while using only a 7 million trainable parameter world model. We present three ablation experiments to evaluate key components of Drama... In Table 1, the Normalised Mean refers to the average normalised score... As shown in Figure 4 in the appendix, Drama significantly outperforms Dreamer V3XS, achieving a normalised mean score of 105 compared to 37 and a normalised median score of 27 compared to 7, as presented in Table 3.
Researcher Affiliation	Academia	Wenlong Wang, Ivana Dusparic, Yucheng Shi, Ke Zhang & Vinny Cahill School of Computer Science and Statistics Trinity College Dublin, the University of Dublin College Green, Dublin 2, Ireland EMAIL
Pseudocode	Yes	Algorithm 1 Training the world model and the behaviour policy
Open Source Code	Yes	Our code is available at https://github.com/realwenlongwang/Drama.git.
Open Datasets	Yes	We evaluate the model using the Atari100k benchmark (Kaiser et al., 2020), which is widely used for assessing the sample efficiency of RL algorithms.
Dataset Splits	Yes	Atari100k limits interactions with the environment to 100,000 steps (equivalent to 400,000 frames with 4-frame skipping). For each game, we train Drama with 5 independent seeds and track training performance using a 5-episode running average, as recommended by Machado et al. (2018), a practice also followed in related work (Hafner et al., 2023).
Hardware Specification	Yes	Drama is accessible and trainable on off-the-shelf hardware, such as a standard laptop. Experiments were conducted on a consumer-grade laptop with an NVIDIA RTX 2000 Ada Mobile GPU, ensuring practical relevance to resource-constrained settings.
Software Dependencies	No	The paper mentions being "implemented on top of the STORM infrastructure (Zhang et al., 2023)" and references an implementation using "pure Py Torch and MLX" for a specific experiment. However, it does not provide specific version numbers for key software components used for the main Drama implementation.
Experiment Setup	Yes	Appendix A.4 LOSS AND HYPERPARAMETERS details the hyperparameters for various components: A.4.1 VARIATIONAL AUTOENCODER (Table 5), A.4.2 MAMBA AND MAMBA-2 (Table 6), A.4.3 REWARD AND TERMINATION PREDICTION HEADS (Table 7), and A.4.4 ACTOR CRITIC HYPERPARAMETERS (Table 8). These tables specify values like learning rate, frame shape, layers, filters, stride, kernel, weight decay, activation functions, hidden state dimensions, gamma, lambda, entropy coefficient, max gradient norm, and batch sizes.