Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Neuroplastic Expansion in Deep Reinforcement Learning

Authors: Jiashun Liu, Johan S Obando Ceron, Aaron Courville, Ling Pan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that NE effectively mitigates plasticity loss and outperforms state-of-the-art methods across various tasks in Mu Jo Co and Deep Mind Control Suite environments.
Researcher Affiliation Academia Jiashun Liu HKUST Johan Obando-Ceron Mila Qu ebec AI Institute Universit e de Montr eal Aaron Courville Mila Qu ebec AI Institute Universit e de Montr eal Ling Pan HKUST Corresponding author, email: EMAIL
Pseudocode Yes D PSEUDOCODE CODE D.1 PSEUDO-CODE FOR NE Algorithm 1 Neuroplastic Expansion TD3 (...) D.2 PSEUDO-CODE FOR TRUNCATE PROCESS Algorithm 2 Truncate Process
Open Source Code Yes We make our code publicly available.
Open Datasets Yes Extensive experiments demonstrate that NE effectively mitigates plasticity loss and outperforms state-of-the-art methods across various tasks in Mu Jo Co and Deep Mind Control Suite environments. (...) We conduct a series of experiments based on the standard continuous control tasks from Open AI Gym (Brockman, 2016) simulated by Mu Jo Co (Todorov et al., 2012) with long-term training setting, i.e. 3M steps 6M.
Dataset Splits No The paper describes training in various environments for a certain number of steps (e.g., "3M steps 6M") and samples from a replay buffer, but it does not provide explicit training/test/validation dataset splits in the conventional sense for supervised learning.
Hardware Specification Yes Our codes are implemented with Python 3.8 and Torch 1.12.1. All experiments were run on NVIDIA Ge Force GTX 3090 GPUs.
Software Dependencies Yes Our codes are implemented with Python 3.8 and Torch 1.12.1.
Experiment Setup Yes The hyper-parameters for TD3 are presented in Table 2. (...) For Humanoid and Ant tasks, we set grow interval T = 25000, grow number k = 0.01 rest capacity, Prune upper bond ω = 0.4, ending step is the max training step, the threshold of ER is 0.35 and the decay weight α = 0.02(which is used in all the tasks). For other Open AI Mujoco tasks, we set grow interval T = 20000, grow number k = 0.15 rest capacity, Prune upper bond ω = 0.2, ending step is the max training step, the threshold of ER is 0.25.