reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Lifelong Reinforcement Learning with Modulating Masks

Authors: Eseoghene Ben-Iwhiwhu, Saptarshi Nath, Praveen Kumar Pilly, Soheil Kolouri, Andrea Soltoggio

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The comparison with LRL baselines in both discrete and continuous RL tasks shows superior performance. We further investigated the use of a linear combination of previously learned masks to exploit previous knowledge when learning new tasks: not only is learning faster, the algorithm solves tasks that we could not otherwise solve from scratch due to extremely sparse rewards. The results suggest that RL with modulating masks is a promising approach to lifelong learning, to the composition of knowledge to learn increasingly complex tasks, and to knowledge reuse for efficient and faster learning. The three novel approaches, Mask RI, Mask LC and Mask BLC, are tested on a set of LRL benchmarks across discrete and continuous action space environments. The metrics report a lifelong evaluation across all tasks at different points during the lifelong training, computed as the average sum of reward obtained across all tasks in the curriculum. The area under the curve (AUC) is reported in corresponding tables.
Researcher Affiliation	Collaboration	Eseoghene Ben-Iwhiwhu EMAIL Department of Computer Science, Loughborough University, UK Saptarshi Nath EMAIL Department of Computer Science, Loughborough University, UK Praveen K. Pilly EMAIL HRL Laboratories, LLC, Malibu, CA, 90265, USA Soheil Kolouri EMAIL Department of Computer Science, Vanderbilt University, Nashville, TN, USA Andrea Soltoggio EMAIL Department of Computer Science, Loughborough University, UK
Pseudocode	Yes	Algorithm 1 Lifelong RL Algorithm with modulating masks Algorithm 2 Forward pass in network layer l in Mask LC
Open Source Code	Yes	To ensure reproducibility, the hyperparameters for the experiments are reported in Appendix B. The code is published at https://github. com/dlpbc/mask-lrl.
Open Datasets	Yes	To demonstrate the first point, we test the approach on RL curricula with the Minigrid environment (Chevalier-Boisvert et al., 2018), the CT-graph (Soltoggio et al., 2019; 2023), Metaworld (Yu et al., 2020), and Proc Gen (Cobbe et al., 2020), and assess the lifelong learning metrics (New et al., 2022; Baker et al., 2023) when learning multiple tasks in sequence. We evaluated the novel methods in a robotics environment with continuous action space, the Continual World (Wołczyk et al., 2021).
Dataset Splits	Yes	For each task, agents are trained on 200 levels. However, the evaluation is carried out on the distribution of all levels, which is a combination of levels seen and unseen during training.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions using specific algorithms like PPO (Schulman et al., 2017) and IMPALA (Espeholt et al., 2018) but does not provide specific version numbers for these or other software dependencies, libraries, or frameworks used for implementation.
Experiment Setup	Yes	To ensure reproducibility, the hyperparameters for the experiments are reported in Appendix B.