reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL

Authors: Ghada Sokar, Johan S Obando Ceron, Aaron Courville, Hugo Larochelle, Pablo Samuel Castro

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a series of analyses to identify the key factors that allow soft Mixture of Experts (Soft Mo Es) to effectively scale RL agents. We evaluate performance on the same 20 games of the Arcade Learning Environment (ALE) (Bellemare et al., 2013) benchmark used by Obando Ceron* et al. (2024) for direct comparison, with 5 independent seeds for each experiment. However, we include the results of our main ﬁndings on the full suite of 60 games. We will use the notation Soft Mo E-n to denote a Soft Mo E architecture with n experts. All experiments were run on Tesla P100 GPUs... Following the guidelines suggested by Agarwal et al. (2021), we report human-normalized aggregated interquartile mean (IQM), regular mean, median, and optimality gap, with error bars indicating 95% stratiﬁed bootstrap conﬁdence intervals.
Researcher Affiliation	Collaboration	Ghada Sokar Google Deep Mind EMAIL Johan Obando-Ceron Mila, Universit e de Montr eal EMAIL Aaron Courville Mila, Universit e de Montr eal EMAIL Hugo Larochelle Google Deep Mind EMAIL Pablo Samuel Castro Google Deep Mind Mila, Universit e de Montr eal EMAIL
Pseudocode	No	The paper includes diagrams illustrating network architectures (e.g., Figure 2) but does not contain any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper states: "All experiments were run on Tesla P100 GPUs using the same Dopamine library (Castro et al., 2018) used by Obando Ceron* et al. (2024);" This refers to a third-party library used by the authors, not a release of the authors' own implementation or code specific to this paper's methodology. There is no explicit statement of code release or a repository link provided.
Open Datasets	Yes	We evaluate performance on the same 20 games of the Arcade Learning Environment (ALE) (Bellemare et al., 2013) benchmark used by Obando Ceron* et al. (2024) for direct comparison, with 5 independent seeds for each experiment.
Dataset Splits	Yes	We evaluate performance on the same 20 games of the Arcade Learning Environment (ALE) (Bellemare et al., 2013) benchmark used by Obando Ceron* et al. (2024) for direct comparison, with 5 independent seeds for each experiment. However, we include the results of our main ﬁndings on the full suite of 60 games. ...each run with 200 million environment steps took between six and eight days.
Hardware Specification	Yes	All experiments were run on Tesla P100 GPUs using the same Dopamine library (Castro et al., 2018) used by Obando Ceron* et al. (2024);
Software Dependencies	No	The paper mentions "Dopamine library (Castro et al., 2018)" and references tools like "Num Py (Harris et al., 2020), Matplotlib (Hunter, 2007) and JAX (Bradbury et al., 2018)". However, it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	Appendix A: EXPERIMENTAL DETAILS, Table 1: Default hyper-parameters setting for DQN, Rainbow and DER agents. This table lists specific hyper-parameters such as Adam's (ϵ), Adam's learning rate, Batch Size, Conv. Activation Function, Convolutional Width, Dense Activation Function, Dense Width, Normalization, Discount Factor, Exploration ϵ, Exploration ϵ decay, Minimum Replay History, Number of Atoms, Number of Convolutional Layers, Number of Dense Layers, Replay Capacity, Reward Clipping, Update Horizon, Update Period, Weight Decay, and Sticky Actions.