Bootstrapped Meta-Learning

Authors: Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh

ICLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we find that BMG provides substantial performance improvements over standard meta-gradients in various settings. We obtain a new state-of-the-art result for model-free agents on Atari (Section 5.2) and improve upon MAML (Finn et al., 2017) in the few-shot setting (Section 6).
Researcher Affiliation Industry Sebastian Flennerhag Deep Mind EMAIL Yannick Schroecker Deep Mind Tom Zahavy Deep Mind Hado van Hasselt Deep Mind David Silver Deep Mind Satinder Singh Deep Mind
Pseudocode Yes Algorithm 1 N-step RL actor loop
Open Source Code No No explicit statement about the authors' own source code being released or available through a link.
Open Datasets Yes Mini Imagenet (Vinyals et al., 2016; Ravi & Larochelle, 2017) is a sub-sample of the Imagenet dataset (Deng et al., 2009). Specifically, it is a subset of 100 classes sampled randomly from the 1000 classes in the ILSVRC-12 training set, with 600 images for each class. We follow the standard protocol (Ravi & Larochelle, 2017) and split classes into a non-overlapping meta-training, meta-validation, and meta-tests sets with 64, 16, and 20 classes in each, respectively. The datasset is licenced under the MIT licence and the ILSVRC licence. The dataset can be obtained from https://paperswithcode. com/dataset/miniimagenet-1.
Dataset Splits Yes We follow the standard protocol (Ravi & Larochelle, 2017) and split classes into a non-overlapping meta-training, meta-validation, and meta-tests sets with 64, 16, and 20 classes in each, respectively.
Hardware Specification Yes IMPALA s distributed setup is implemented on a single machine with 56 CPU cores and 8 TPU (Jouppi et al., 2017) cores. 2 TPU cores are used to act in 48 environments asynchronously in parallel, sending rollouts to a replay buffer that a centralized learner use to update agent parameters and meta-parameters. Gradient computations are distributed along the batch dimension across the remaining 6 TPU cores. All Atari experiments use this setup; training for 200 millions frames takes 24 hours. Each model is trained on a single machine and runs on a V100 NVIDIA GPU.
Software Dependencies No No specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow, etc.) are provided.
Experiment Setup Yes Table 1: Two-colors hyper-parameters and Table 2: Atari hyper-parameters contain detailed lists of hyperparameters for optimizers, network architectures, and RL-specific parameters. For instance: Optimiser SGD Learning rate 0.1 Batch size 16 (losses are averaged) γ 0.99 for Actor-critic Inner Learner.