Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn
Authors: Hongyao Tang, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Glen Berseth
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate our approach under extreme non-stationarity, we conduct the experiments in various continual RL environments built on Open AI Gym Control (Brockman et al., 2016), Proc Gen (Cobbe et al., 2020), Deep Mind Control Suite (Tassa et al., 2018), and Min Atar (Young & Tian, 2019) benchmarks, with a total of 24 continual RL environments. The results show that reducing churn effectively improves the agent s performance in continual RL, and outperforms related methods in most environments. In our experiments, we first evaluate C-CHAIN in comparison with recent related methods, to find out whether it improves continual RL (Section 5.1). Then, we conduct the empirical analysis to examine whether reducing churn prevents the rank decrease and how the two effects contribute differently (Section 5.2). Finally, we extend the evaluation to more continual learning settings (Section 5.3). |
| Researcher Affiliation | Academia | 1Mila Qu ebec AI Institute 2Universit e de Montr eal. Correspondence to: Hongyao Tang <EMAIL>, Glen Berseth <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Deep Continual RL with Continual Churn Approximated Reduction (C-CHAIN). |
| Open Source Code | Yes | We propose C-CHAIN1 and demonstrate it effectively mitigates the loss of plasticity and outperforms prior methods in a range of continual RL settings. 1https://github.com/bluecontra/C-CHAIN |
| Open Datasets | Yes | To evaluate our approach under extreme non-stationarity, we conduct the experiments in various continual RL environments built on Open AI Gym Control (Brockman et al., 2016), Proc Gen (Cobbe et al., 2020), Deep Mind Control Suite (Tassa et al., 2018), and Min Atar (Young & Tian, 2019) benchmarks, with a total of 24 continual RL environments. We extend our empirical evaluation of C-CHAIN to continual supervised learning setting. We follow the settings in L2 Init (Kumar et al., 2023b) and adopt Random Label-MNIST and Permuted-MNIST as our testbed. |
| Dataset Splits | No | The paper describes how tasks are constructed (e.g., chaining environment instances with noise, procedural generation for Proc Gen, random sampling from MNIST for supervised tasks) and the budget of interactions/epochs per task. However, it does not explicitly provide specific training/test/validation split percentages or sample counts for the data used within each individual task instance. |
| Hardware Specification | Yes | For the continual Gym Control and Proc Gen experiments, we allocate a single V100 GPU, 16 CPUs and 32GB memory for 4 to 6 jobs, typically running for around 4 hours for Gym Control and 20 hours for Proc Gen to |
| Software Dependencies | No | The paper mentions using Python, NumPy, Matplotlib, Jupyter, and Pandas, and refers to codebases like TRAC, Clean RL, and Min Atar, but does not provide specific version numbers for these software components or libraries. |
| Experiment Setup | Yes | For the hyperparameters of the PPO baseline agent, we use the default values recommended in the code base and keep them consistent across all the methods. For the hyperparameters specific to each method, we search around the recommendation values in the original papers and report the best. More experiment details are provided in Appendix A. The hyperparameters are provided in Table 5. Table 5 specifies detailed hyperparameters for PPO and C-CHAIN, including Learning Rate, Discount Factor (γ), Mini-batch Size, Update Epoch, Clipping Range Parameter (ϵ), and other method-specific parameters. |