Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Building a Subspace of Policies for Scalable Continual Learning
Authors: Jean-Baptiste Gaya, Thang Doan, Lucas Caccia, Laure Soulier, Ludovic Denoyer, Roberta Raileanu
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on 18 CRL scenarios from two different domains, locomotion in Brax and robotic manipulation in Continual World, a challenging CRL benchmark (Wołczyk et al., 2021). We also compare CSP with a number of popular CRL baselines, including both fixed-size and growing-size methods. |
| Researcher Affiliation | Collaboration | Jean-Baptiste Gaya Meta AI Research CNRS-ISIR, Sorbonne University, Paris, France EMAIL Thang Doan Mc Gill University, Mila (Now at Bosch Research) EMAIL Lucas Caccia Mc Gill University, Mila EMAIL Laure Soulier CNRS-ISIR, Sorbonne University, Paris, France EMAIL Ludovic Denoyer Ubisoft France EMAIL Roberta Raileanu Meta AI Research EMAIL |
| Pseudocode | Yes | A pseudo-code is available in Appendix C.1. |
| Open Source Code | Yes | Code is available here. |
| Open Datasets | Yes | We evaluate CSP on 18 CRL scenarios containing 35 different RL tasks, from two continuous control domains, locomotion in Brax (Freeman et al., 2021) and robotic manipulation in Continual World (CW, Wołczyk et al. (2021)), a challenging CRL benchmark. |
| Dataset Splits | No | The paper mentions training budgets ('budget of 1M interactions for each task') and evaluation procedures, but does not specify explicit train/validation/test data splits or proportions for the datasets used. |
| Hardware Specification | Yes | Each algorithm was trained using one Intel(R) Xeon(R) CPU cores (E5-2698 v4 @ 2.20GHz) and one NVIDIA V100 GPU. |
| Software Dependencies | No | The paper states: 'All the experiments were implemented with Sa Lin A (Denoyer et al., 2021)... We used Soft Actor Critic (Haarnoja et al., 2018b) as the routine algorithm for each method.' However, specific version numbers for these or other software libraries (e.g., Python, PyTorch) are not provided. |
| Experiment Setup | Yes | We run a gridsearch on SAC hyper-parameters (see Table 3) on FT-N and select the best set in terms of final average performance (see Section 4 for the details about this metric). Then, we freeze these hyper-parameters and performed a specific gridsearch for CSP and each baseline (see Table 4). Each hyper-parameter set is evaluated over 10 seeds. |