Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Neuroplastic Expansion in Deep Reinforcement Learning
Authors: Jiashun Liu, Johan S Obando Ceron, Aaron Courville, Ling Pan
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that NE effectively mitigates plasticity loss and outperforms state-of-the-art methods across various tasks in Mu Jo Co and Deep Mind Control Suite environments. |
| Researcher Affiliation | Academia | Jiashun Liu HKUST Johan Obando-Ceron Mila Qu ebec AI Institute Universit e de Montr eal Aaron Courville Mila Qu ebec AI Institute Universit e de Montr eal Ling Pan HKUST Corresponding author, email: EMAIL |
| Pseudocode | Yes | D PSEUDOCODE CODE D.1 PSEUDO-CODE FOR NE Algorithm 1 Neuroplastic Expansion TD3 (...) D.2 PSEUDO-CODE FOR TRUNCATE PROCESS Algorithm 2 Truncate Process |
| Open Source Code | Yes | We make our code publicly available. |
| Open Datasets | Yes | Extensive experiments demonstrate that NE effectively mitigates plasticity loss and outperforms state-of-the-art methods across various tasks in Mu Jo Co and Deep Mind Control Suite environments. (...) We conduct a series of experiments based on the standard continuous control tasks from Open AI Gym (Brockman, 2016) simulated by Mu Jo Co (Todorov et al., 2012) with long-term training setting, i.e. 3M steps 6M. |
| Dataset Splits | No | The paper describes training in various environments for a certain number of steps (e.g., "3M steps 6M") and samples from a replay buffer, but it does not provide explicit training/test/validation dataset splits in the conventional sense for supervised learning. |
| Hardware Specification | Yes | Our codes are implemented with Python 3.8 and Torch 1.12.1. All experiments were run on NVIDIA Ge Force GTX 3090 GPUs. |
| Software Dependencies | Yes | Our codes are implemented with Python 3.8 and Torch 1.12.1. |
| Experiment Setup | Yes | The hyper-parameters for TD3 are presented in Table 2. (...) For Humanoid and Ant tasks, we set grow interval T = 25000, grow number k = 0.01 rest capacity, Prune upper bond ω = 0.4, ending step is the max training step, the threshold of ER is 0.35 and the decay weight α = 0.02(which is used in all the tasks). For other Open AI Mujoco tasks, we set grow interval T = 20000, grow number k = 0.15 rest capacity, Prune upper bond ω = 0.2, ending step is the max training step, the threshold of ER is 0.25. |