Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset
Authors: Alexandre Galashov, Michalis Titsias, András György, Clare Lyle, Razvan Pascanu, Yee Whye Teh, Maneesh Sahani
NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically that our approach performs well in non-stationary supervised, and off-policy reinforcement learning settings. |
| Researcher Affiliation | Collaboration | Alexandre Galashov Gatsby Unit, UCL Google Deep Mind EMAIL Michalis K. Titsias Google Deep Mind EMAIL András György Google Deep Mind EMAIL Clare Lyle Google Deep Mind EMAIL Razvan Pascanu Google Deep Mind EMAIL Yee Whye Teh Google Deep Mind University of Oxford EMAIL Maneesh Sahani Gatsby Unit, UCL EMAIL |
| Pseudocode | Yes | Algorithm 1 Soft-Reset algoritm |
| Open Source Code | No | Unfortunately, due to IP constrains, we cannot release the code for the paper. |
| Open Datasets | Yes | subset of 10000 images images from either CIFAR-10 [32] or MNIST and Hopper-v5 and Humanoid-v4 GYM [6] environments |
| Dataset Splits | No | The paper defines metrics like 'average per-task online accuracy' (Section 5, H.1) which evaluates performance during training. It describes training regimes (e.g., '400 epochs on a task with a batch size of 128') but does not specify a separate validation dataset split (e.g., '10% of data used for validation'). |
| Hardware Specification | Yes | For each experiment, we used a 3 hours of the A100 GPU with 40 Gb of memory. |
| Software Dependencies | No | We ran SAC [19] agent with default parameters from Brax [15] on the Hopper-v5 and Humanoid-v4 GYM [6] environments. No specific version numbers for Brax, GYM, SAC, Python, or other libraries are given. |
| Experiment Setup | Yes | For all the experiments, we run a sweep over the hyperparameters. We select the best hyperparameters based on the smallest cumulative error (sum of all 1 at i throughout the training). We then report the mean and the standard deviation across 3 seeds in all the plots. Hyperparameter ranges . Learning rate α which is used to update parameters, for all the methods, is selected from {1e 4, 5e 4, 1e 3, 5e 3, 1e 2, 5e 2, 1e 1, 5e 1, 1.0}. The λinit parameter in L2 Init, is selected from {10.0, 1.0, 0.0, 1e 1, ...}. For S&P, the shrink parameter λ is selected from {1.0, 0.99999, ...}, and the perturbation parameter σ is from {1e 1, ...}. For Soft Resets, the learning rate for γt is selected from {0.5, 0.1, ...}, the constant s is selected from {1.0, 0.95, ...}, the temperature λ in (45) is selected from {1.0, 0.1, 0.01}... |