reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn

Authors: Hongyao Tang, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Glen Berseth

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate our approach under extreme non-stationarity, we conduct the experiments in various continual RL environments built on Open AI Gym Control (Brockman et al., 2016), Proc Gen (Cobbe et al., 2020), Deep Mind Control Suite (Tassa et al., 2018), and Min Atar (Young & Tian, 2019) benchmarks, with a total of 24 continual RL environments. The results show that reducing churn effectively improves the agent s performance in continual RL, and outperforms related methods in most environments. In our experiments, we first evaluate C-CHAIN in comparison with recent related methods, to find out whether it improves continual RL (Section 5.1). Then, we conduct the empirical analysis to examine whether reducing churn prevents the rank decrease and how the two effects contribute differently (Section 5.2). Finally, we extend the evaluation to more continual learning settings (Section 5.3).
Researcher Affiliation	Academia	1Mila Qu ebec AI Institute 2Universit e de Montr eal. Correspondence to: Hongyao Tang <EMAIL>, Glen Berseth <EMAIL>.
Pseudocode	Yes	Algorithm 1 Deep Continual RL with Continual Churn Approximated Reduction (C-CHAIN).
Open Source Code	Yes	We propose C-CHAIN1 and demonstrate it effectively mitigates the loss of plasticity and outperforms prior methods in a range of continual RL settings. 1https://github.com/bluecontra/C-CHAIN
Open Datasets	Yes	To evaluate our approach under extreme non-stationarity, we conduct the experiments in various continual RL environments built on Open AI Gym Control (Brockman et al., 2016), Proc Gen (Cobbe et al., 2020), Deep Mind Control Suite (Tassa et al., 2018), and Min Atar (Young & Tian, 2019) benchmarks, with a total of 24 continual RL environments. We extend our empirical evaluation of C-CHAIN to continual supervised learning setting. We follow the settings in L2 Init (Kumar et al., 2023b) and adopt Random Label-MNIST and Permuted-MNIST as our testbed.
Dataset Splits	No	The paper describes how tasks are constructed (e.g., chaining environment instances with noise, procedural generation for Proc Gen, random sampling from MNIST for supervised tasks) and the budget of interactions/epochs per task. However, it does not explicitly provide specific training/test/validation split percentages or sample counts for the data used within each individual task instance.
Hardware Specification	Yes	For the continual Gym Control and Proc Gen experiments, we allocate a single V100 GPU, 16 CPUs and 32GB memory for 4 to 6 jobs, typically running for around 4 hours for Gym Control and 20 hours for Proc Gen to
Software Dependencies	No	The paper mentions using Python, NumPy, Matplotlib, Jupyter, and Pandas, and refers to codebases like TRAC, Clean RL, and Min Atar, but does not provide specific version numbers for these software components or libraries.
Experiment Setup	Yes	For the hyperparameters of the PPO baseline agent, we use the default values recommended in the code base and keep them consistent across all the methods. For the hyperparameters specific to each method, we search around the recommendation values in the original papers and report the best. More experiment details are provided in Appendix A. The hyperparameters are provided in Table 5. Table 5 specifies detailed hyperparameters for PPO and C-CHAIN, including Learning Rate, Discount Factor (γ), Mini-batch Size, Update Epoch, Clipping Range Parameter (ϵ), and other method-specific parameters.