reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

Authors: Guozheng Ma, Lu Li, Zilin Wang, Li Shen, Pierre-Luc Bacon, Dacheng Tao

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through a series of ablation and scaling studies we reveal a clear and definitive answer: YES! As summarized in Figure 1, sparse networks continue to show performance gains well beyond the point where their dense counterparts hit scaling limits, demonstrating superior parameter efficiency and enhanced scalability at larger model sizes. Subsequently, Section 4 delves into why introducing sparsity can break through current scaling barriers by leveraging a range of empirical metrics as diagnostic tools. Our analysis reveals that while larger model sizes tend to induce more severe optimization pathologies, appropriate network sparsity effectively counteracts these negative effects by preventing capacity and plasticity loss (Klein et al., 2024), constraining parameter growth (Lyle et al., 2024b), enhancing simplicity bias (Lee et al., 2024), and mitigating gradient interference (Lyle et al., 2023). Furthermore, in Section 5, we extend our empirical evaluation to visual RL and streaming RL, demonstrating that the benefits of network sparsity consistently generalize across diverse RL setups.
Researcher Affiliation	Academia	1Nanyang Technical University 2Mila Quebec AI Institute 3Universit e de Montr eal 4University of Oxford. Correspondence to: Li Shen <EMAIL>, Dacheng Tao <EMAIL>.
Pseudocode	No	The paper describes methods and calculations (e.g., Srank, dormant ratio, gradient covariance matrices) using mathematical formulas, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is publicly available at Git Hub .
Open Datasets	Yes	We conducted extensive experiments on several of the most challenging Deep Mind Control (DMC) (Tassa et al., 2018) tasks... We conducted visual RL experiments on DMC using image input as the observation... We conducted streaming RL experiments on two Mu Jo Co robot locomotion tasks (Todorov et al., 2012)... We conducted Atari experiments on the Atari-100k benchmark (Kaiser et al., 2020).
Dataset Splits	No	The paper uses standard benchmarks like Deep Mind Control (DMC) and Atari-100k, but it does not explicitly provide specific details about how the authors split these datasets into training, validation, or test sets, such as percentages, sample counts, or references to predefined splits for their specific experimental setup.
Hardware Specification	No	The paper mentions "This research is enabled in part by compute resources, software and technical help provided by Mila (mila.quebec)" in the Acknowledgment section. However, it does not specify any concrete hardware details such as GPU models, CPU types, or memory amounts used for the experiments.
Software Dependencies	No	The paper mentions various algorithms and frameworks used, such as Soft Actor-Critic (SAC), Deep Deterministic Policy Gradient (DDPG), Sim Ba architecture, Dr Q-v2, Stream AC(λ) algorithm, Dopamine, Data Efficient Rainbow (DER), and Impala-CNN architecture. However, it does not provide specific version numbers for any of these software components or underlying libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Experimental Setup. Introducing network sparsity when scaling up model size requires careful control of multiple variables in our comparative experiments, including the width and depth of both actor and critic networks, as well as their respective sparsity levels... Complete experimental details are provided in Appendix B.1. Table 2. SAC hyperparameters. The hyperparameters listed below are used consistently across all experiments in Section 3 and Section 4. For the discount factor, we follow Lee et al. (2024) using heuristics used by TD-MPC2 (Hansen et al., 2023). Table 3. DDPG hyperparameters. The hyperparameters listed below are used consistently across all experiments in Section 3 and Section 4. For the discount factor, we follow Lee et al. (2024) using heuristics used by TD-MPC2 (Hansen et al., 2023).