Sparsity-Driven Plasticity in Multi-Task Reinforcement Learning
Authors: Aleksandar Todorov, Juan Cardenas-Cartagena, Rafael F. Cunha, Marco Zullich, Matthia Sabatelli
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate these approaches across distinct MTRL architectures (shared backbone, Mixture of Experts, Mixture of Orthogonal Experts) on standardized MTRL benchmarks, comparing against dense baselines, and a comprehensive range of alternative plasticity-inducing or regularization methods. Our results demonstrate that both GMP and SET effectively mitigate key indicators of plasticity degradation, such as neuron dormancy and representational collapse. These plasticity improvements often correlate with enhanced multi-task performance, with sparse agents frequently outperforming dense counterparts and achieving competitive results against explicit plasticity interventions. |
| Researcher Affiliation | Academia | Aleksandar Todorov EMAIL Juan Cardenas-Cartagena EMAIL Rafael F. Cunha EMAIL Marco Zullich EMAIL Matthia Sabatelli EMAIL University of Groningen, Groningen, The Netherlands |
| Pseudocode | No | The paper describes algorithms like Gradual Magnitude Pruning (GMP) and Sparse Evolutionary Training (SET) in text paragraphs (Sections C.4 and C.5) but does not present them as structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The full implementation is available at https://github.com/atodorov284/sparsity_driven_plasticity. |
| Open Datasets | Yes | Environment and Benchmarks We mostly consider the three multi-task Mini Grid (Chevalier-Boisvert et al., 2023) benchmarks proposed by Hendawy et al. (2024) MT3, MT5, and MT7, with the exception being made for the results presented in Section 4.3, which use the Meta World MT10 benchmark (Yu et al., 2021). |
| Dataset Splits | No | The paper describes using Mini Grid and Meta World MT10 benchmarks where tasks are sampled randomly with replacement during training, and evaluation is done using a certain number of episodes per task. It does not provide specific training/test/validation splits for a fixed dataset in the traditional sense. |
| Hardware Specification | No | The paper does not explicitly mention the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments. It only refers to the training environment without detailing the computing infrastructure. |
| Software Dependencies | No | The paper mentions using the mushroom_rl library and the Adam optimizer, but it does not specify their exact version numbers. It also refers to the rliable library without a version. |
| Experiment Setup | Yes | Appendix A provides detailed hyperparameters in Table 2 ('Core experimental setup, agent architecture, and algorithm hyperparameters on Mini Grid') and Table 3 ('The hyperparameters and training setup used for MTMH SAC on Meta World MT10'), covering aspects like number of environments, steps per epoch, total timesteps, train frequency, evaluation episodes/frequency, optimizer details (Adam, learning rates), network architecture (Conv2D channels, kernel sizes, activations, hidden sizes), GAE λ, Entropy Term Coefficient, Clipping ε, Epochs for Policy/Critic, Batch Size, Discount Factor, and specific parameters for MoE/MOORE, MTMH SAC, and sparsity methods. |