reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bootstrapped Meta-Learning

Authors: Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we find that BMG provides substantial performance improvements over standard meta-gradients in various settings. We obtain a new state-of-the-art result for model-free agents on Atari (Section 5.2) and improve upon MAML (Finn et al., 2017) in the few-shot setting (Section 6).
Researcher Affiliation	Industry	Sebastian Flennerhag Deep Mind EMAIL Yannick Schroecker Deep Mind Tom Zahavy Deep Mind Hado van Hasselt Deep Mind David Silver Deep Mind Satinder Singh Deep Mind
Pseudocode	Yes	Algorithm 1 N-step RL actor loop
Open Source Code	No	No explicit statement about the authors' own source code being released or available through a link.
Open Datasets	Yes	Mini Imagenet (Vinyals et al., 2016; Ravi & Larochelle, 2017) is a sub-sample of the Imagenet dataset (Deng et al., 2009). Speciﬁcally, it is a subset of 100 classes sampled randomly from the 1000 classes in the ILSVRC-12 training set, with 600 images for each class. We follow the standard protocol (Ravi & Larochelle, 2017) and split classes into a non-overlapping meta-training, meta-validation, and meta-tests sets with 64, 16, and 20 classes in each, respectively. The datasset is licenced under the MIT licence and the ILSVRC licence. The dataset can be obtained from https://paperswithcode. com/dataset/miniimagenet-1.
Dataset Splits	Yes	We follow the standard protocol (Ravi & Larochelle, 2017) and split classes into a non-overlapping meta-training, meta-validation, and meta-tests sets with 64, 16, and 20 classes in each, respectively.
Hardware Specification	Yes	IMPALA s distributed setup is implemented on a single machine with 56 CPU cores and 8 TPU (Jouppi et al., 2017) cores. 2 TPU cores are used to act in 48 environments asynchronously in parallel, sending rollouts to a replay buffer that a centralized learner use to update agent parameters and meta-parameters. Gradient computations are distributed along the batch dimension across the remaining 6 TPU cores. All Atari experiments use this setup; training for 200 millions frames takes 24 hours. Each model is trained on a single machine and runs on a V100 NVIDIA GPU.
Software Dependencies	No	No specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow, etc.) are provided.
Experiment Setup	Yes	Table 1: Two-colors hyper-parameters and Table 2: Atari hyper-parameters contain detailed lists of hyperparameters for optimizers, network architectures, and RL-specific parameters. For instance: Optimiser SGD Learning rate 0.1 Batch size 16 (losses are averaged) γ 0.99 for Actor-critic Inner Learner.