Lifelong Reinforcement Learning with Modulating Masks

Authors: Eseoghene Ben-Iwhiwhu, Saptarshi Nath, Praveen Kumar Pilly, Soheil Kolouri, Andrea Soltoggio

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The comparison with LRL baselines in both discrete and continuous RL tasks shows superior performance. We further investigated the use of a linear combination of previously learned masks to exploit previous knowledge when learning new tasks: not only is learning faster, the algorithm solves tasks that we could not otherwise solve from scratch due to extremely sparse rewards. The results suggest that RL with modulating masks is a promising approach to lifelong learning, to the composition of knowledge to learn increasingly complex tasks, and to knowledge reuse for efficient and faster learning. The three novel approaches, Mask RI, Mask LC and Mask BLC, are tested on a set of LRL benchmarks across discrete and continuous action space environments. The metrics report a lifelong evaluation across all tasks at different points during the lifelong training, computed as the average sum of reward obtained across all tasks in the curriculum. The area under the curve (AUC) is reported in corresponding tables.
Researcher Affiliation Collaboration Eseoghene Ben-Iwhiwhu EMAIL Department of Computer Science, Loughborough University, UK Saptarshi Nath EMAIL Department of Computer Science, Loughborough University, UK Praveen K. Pilly EMAIL HRL Laboratories, LLC, Malibu, CA, 90265, USA Soheil Kolouri EMAIL Department of Computer Science, Vanderbilt University, Nashville, TN, USA Andrea Soltoggio EMAIL Department of Computer Science, Loughborough University, UK
Pseudocode Yes Algorithm 1 Lifelong RL Algorithm with modulating masks Algorithm 2 Forward pass in network layer l in Mask LC
Open Source Code Yes To ensure reproducibility, the hyperparameters for the experiments are reported in Appendix B. The code is published at https://github. com/dlpbc/mask-lrl.
Open Datasets Yes To demonstrate the first point, we test the approach on RL curricula with the Minigrid environment (Chevalier-Boisvert et al., 2018), the CT-graph (Soltoggio et al., 2019; 2023), Metaworld (Yu et al., 2020), and Proc Gen (Cobbe et al., 2020), and assess the lifelong learning metrics (New et al., 2022; Baker et al., 2023) when learning multiple tasks in sequence. We evaluated the novel methods in a robotics environment with continuous action space, the Continual World (Wołczyk et al., 2021).
Dataset Splits Yes For each task, agents are trained on 200 levels. However, the evaluation is carried out on the distribution of all levels, which is a combination of levels seen and unseen during training.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies No The paper mentions using specific algorithms like PPO (Schulman et al., 2017) and IMPALA (Espeholt et al., 2018) but does not provide specific version numbers for these or other software dependencies, libraries, or frameworks used for implementation.
Experiment Setup Yes To ensure reproducibility, the hyperparameters for the experiments are reported in Appendix B.