Lifelong Reinforcement Learning with Modulating Masks
Authors: Eseoghene Ben-Iwhiwhu, Saptarshi Nath, Praveen Kumar Pilly, Soheil Kolouri, Andrea Soltoggio
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The comparison with LRL baselines in both discrete and continuous RL tasks shows superior performance. We further investigated the use of a linear combination of previously learned masks to exploit previous knowledge when learning new tasks: not only is learning faster, the algorithm solves tasks that we could not otherwise solve from scratch due to extremely sparse rewards. The results suggest that RL with modulating masks is a promising approach to lifelong learning, to the composition of knowledge to learn increasingly complex tasks, and to knowledge reuse for efficient and faster learning. The three novel approaches, Mask RI, Mask LC and Mask BLC, are tested on a set of LRL benchmarks across discrete and continuous action space environments. The metrics report a lifelong evaluation across all tasks at different points during the lifelong training, computed as the average sum of reward obtained across all tasks in the curriculum. The area under the curve (AUC) is reported in corresponding tables. |
| Researcher Affiliation | Collaboration | Eseoghene Ben-Iwhiwhu EMAIL Department of Computer Science, Loughborough University, UK Saptarshi Nath EMAIL Department of Computer Science, Loughborough University, UK Praveen K. Pilly EMAIL HRL Laboratories, LLC, Malibu, CA, 90265, USA Soheil Kolouri EMAIL Department of Computer Science, Vanderbilt University, Nashville, TN, USA Andrea Soltoggio EMAIL Department of Computer Science, Loughborough University, UK |
| Pseudocode | Yes | Algorithm 1 Lifelong RL Algorithm with modulating masks Algorithm 2 Forward pass in network layer l in Mask LC |
| Open Source Code | Yes | To ensure reproducibility, the hyperparameters for the experiments are reported in Appendix B. The code is published at https://github. com/dlpbc/mask-lrl. |
| Open Datasets | Yes | To demonstrate the first point, we test the approach on RL curricula with the Minigrid environment (Chevalier-Boisvert et al., 2018), the CT-graph (Soltoggio et al., 2019; 2023), Metaworld (Yu et al., 2020), and Proc Gen (Cobbe et al., 2020), and assess the lifelong learning metrics (New et al., 2022; Baker et al., 2023) when learning multiple tasks in sequence. We evaluated the novel methods in a robotics environment with continuous action space, the Continual World (Wołczyk et al., 2021). |
| Dataset Splits | Yes | For each task, agents are trained on 200 levels. However, the evaluation is carried out on the distribution of all levels, which is a combination of levels seen and unseen during training. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper mentions using specific algorithms like PPO (Schulman et al., 2017) and IMPALA (Espeholt et al., 2018) but does not provide specific version numbers for these or other software dependencies, libraries, or frameworks used for implementation. |
| Experiment Setup | Yes | To ensure reproducibility, the hyperparameters for the experiments are reported in Appendix B. |