Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL

Authors: Ghada Sokar, Johan S Obando Ceron, Aaron Courville, Hugo Larochelle, Pablo Samuel Castro

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a series of analyses to identify the key factors that allow soft Mixture of Experts (Soft Mo Es) to effectively scale RL agents. We evaluate performance on the same 20 games of the Arcade Learning Environment (ALE) (Bellemare et al., 2013) benchmark used by Obando Ceron* et al. (2024) for direct comparison, with 5 independent seeds for each experiment. However, we include the results of our main findings on the full suite of 60 games. We will use the notation Soft Mo E-n to denote a Soft Mo E architecture with n experts. All experiments were run on Tesla P100 GPUs... Following the guidelines suggested by Agarwal et al. (2021), we report human-normalized aggregated interquartile mean (IQM), regular mean, median, and optimality gap, with error bars indicating 95% stratified bootstrap confidence intervals.
Researcher Affiliation Collaboration Ghada Sokar Google Deep Mind EMAIL Johan Obando-Ceron Mila, Universit e de Montr eal EMAIL Aaron Courville Mila, Universit e de Montr eal EMAIL Hugo Larochelle Google Deep Mind EMAIL Pablo Samuel Castro Google Deep Mind Mila, Universit e de Montr eal EMAIL
Pseudocode No The paper includes diagrams illustrating network architectures (e.g., Figure 2) but does not contain any explicit pseudocode or algorithm blocks.
Open Source Code No The paper states: "All experiments were run on Tesla P100 GPUs using the same Dopamine library (Castro et al., 2018) used by Obando Ceron* et al. (2024);" This refers to a third-party library used by the authors, not a release of the authors' own implementation or code specific to this paper's methodology. There is no explicit statement of code release or a repository link provided.
Open Datasets Yes We evaluate performance on the same 20 games of the Arcade Learning Environment (ALE) (Bellemare et al., 2013) benchmark used by Obando Ceron* et al. (2024) for direct comparison, with 5 independent seeds for each experiment.
Dataset Splits Yes We evaluate performance on the same 20 games of the Arcade Learning Environment (ALE) (Bellemare et al., 2013) benchmark used by Obando Ceron* et al. (2024) for direct comparison, with 5 independent seeds for each experiment. However, we include the results of our main findings on the full suite of 60 games. ...each run with 200 million environment steps took between six and eight days.
Hardware Specification Yes All experiments were run on Tesla P100 GPUs using the same Dopamine library (Castro et al., 2018) used by Obando Ceron* et al. (2024);
Software Dependencies No The paper mentions "Dopamine library (Castro et al., 2018)" and references tools like "Num Py (Harris et al., 2020), Matplotlib (Hunter, 2007) and JAX (Bradbury et al., 2018)". However, it does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes Appendix A: EXPERIMENTAL DETAILS, Table 1: Default hyper-parameters setting for DQN, Rainbow and DER agents. This table lists specific hyper-parameters such as Adam's (ϵ), Adam's learning rate, Batch Size, Conv. Activation Function, Convolutional Width, Dense Activation Function, Dense Width, Normalization, Discount Factor, Exploration ϵ, Exploration ϵ decay, Minimum Replay History, Number of Atoms, Number of Convolutional Layers, Number of Dense Layers, Replay Capacity, Reward Clipping, Update Horizon, Update Period, Weight Decay, and Sticky Actions.