Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset

Authors: Alexandre Galashov, Michalis Titsias, András György, Clare Lyle, Razvan Pascanu, Yee Whye Teh, Maneesh Sahani

NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show empirically that our approach performs well in non-stationary supervised, and off-policy reinforcement learning settings.
Researcher Affiliation Collaboration Alexandre Galashov Gatsby Unit, UCL Google Deep Mind EMAIL Michalis K. Titsias Google Deep Mind EMAIL András György Google Deep Mind EMAIL Clare Lyle Google Deep Mind EMAIL Razvan Pascanu Google Deep Mind EMAIL Yee Whye Teh Google Deep Mind University of Oxford EMAIL Maneesh Sahani Gatsby Unit, UCL EMAIL
Pseudocode Yes Algorithm 1 Soft-Reset algoritm
Open Source Code No Unfortunately, due to IP constrains, we cannot release the code for the paper.
Open Datasets Yes subset of 10000 images images from either CIFAR-10 [32] or MNIST and Hopper-v5 and Humanoid-v4 GYM [6] environments
Dataset Splits No The paper defines metrics like 'average per-task online accuracy' (Section 5, H.1) which evaluates performance during training. It describes training regimes (e.g., '400 epochs on a task with a batch size of 128') but does not specify a separate validation dataset split (e.g., '10% of data used for validation').
Hardware Specification Yes For each experiment, we used a 3 hours of the A100 GPU with 40 Gb of memory.
Software Dependencies No We ran SAC [19] agent with default parameters from Brax [15] on the Hopper-v5 and Humanoid-v4 GYM [6] environments. No specific version numbers for Brax, GYM, SAC, Python, or other libraries are given.
Experiment Setup Yes For all the experiments, we run a sweep over the hyperparameters. We select the best hyperparameters based on the smallest cumulative error (sum of all 1 at i throughout the training). We then report the mean and the standard deviation across 3 seeds in all the plots. Hyperparameter ranges . Learning rate α which is used to update parameters, for all the methods, is selected from {1e 4, 5e 4, 1e 3, 5e 3, 1e 2, 5e 2, 1e 1, 5e 1, 1.0}. The λinit parameter in L2 Init, is selected from {10.0, 1.0, 0.0, 1e 1, ...}. For S&P, the shrink parameter λ is selected from {1.0, 0.99999, ...}, and the perturbation parameter σ is from {1e 1, ...}. For Soft Resets, the learning rate for γt is selected from {0.5, 0.1, ...}, the constant s is selected from {1.0, 0.95, ...}, the temperature λ in (45) is selected from {1.0, 0.1, 0.01}...