Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient
Authors: Wenlong Wang, Ivana Dusparic, Yucheng Shi, Ke Zhang, Vinny Cahill
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Drama on the Atari100k benchmark, demonstrating that it achieves performance comparable to other SOTA algorithms while using only a 7 million trainable parameter world model. We present three ablation experiments to evaluate key components of Drama... In Table 1, the Normalised Mean refers to the average normalised score... As shown in Figure 4 in the appendix, Drama significantly outperforms Dreamer V3XS, achieving a normalised mean score of 105 compared to 37 and a normalised median score of 27 compared to 7, as presented in Table 3. |
| Researcher Affiliation | Academia | Wenlong Wang, Ivana Dusparic, Yucheng Shi, Ke Zhang & Vinny Cahill School of Computer Science and Statistics Trinity College Dublin, the University of Dublin College Green, Dublin 2, Ireland EMAIL |
| Pseudocode | Yes | Algorithm 1 Training the world model and the behaviour policy |
| Open Source Code | Yes | Our code is available at https://github.com/realwenlongwang/Drama.git. |
| Open Datasets | Yes | We evaluate the model using the Atari100k benchmark (Kaiser et al., 2020), which is widely used for assessing the sample efficiency of RL algorithms. |
| Dataset Splits | Yes | Atari100k limits interactions with the environment to 100,000 steps (equivalent to 400,000 frames with 4-frame skipping). For each game, we train Drama with 5 independent seeds and track training performance using a 5-episode running average, as recommended by Machado et al. (2018), a practice also followed in related work (Hafner et al., 2023). |
| Hardware Specification | Yes | Drama is accessible and trainable on off-the-shelf hardware, such as a standard laptop. Experiments were conducted on a consumer-grade laptop with an NVIDIA RTX 2000 Ada Mobile GPU, ensuring practical relevance to resource-constrained settings. |
| Software Dependencies | No | The paper mentions being "implemented on top of the STORM infrastructure (Zhang et al., 2023)" and references an implementation using "pure Py Torch and MLX" for a specific experiment. However, it does not provide specific version numbers for key software components used for the main Drama implementation. |
| Experiment Setup | Yes | Appendix A.4 LOSS AND HYPERPARAMETERS details the hyperparameters for various components: A.4.1 VARIATIONAL AUTOENCODER (Table 5), A.4.2 MAMBA AND MAMBA-2 (Table 6), A.4.3 REWARD AND TERMINATION PREDICTION HEADS (Table 7), and A.4.4 ACTOR CRITIC HYPERPARAMETERS (Table 8). These tables specify values like learning rate, frame shape, layers, filters, stride, kernel, weight decay, activation functions, hidden state dimensions, gamma, lambda, entropy coefficient, max gradient norm, and batch sizes. |