reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Continual HyperTransformer: A Meta-Learner for Continual Few-Shot Learning

Authors: Max Vladymyrov, Andrey Zhmoginov, Mark Sandler

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that our proposed Continual Hyper Transformer method equipped with a prototypical loss is capable of learning and retaining knowledge about past tasks for a variety of scenarios, including learning from mini-batches, and task-incremental and class-incremental learning scenarios. ... Most of our experiments were conducted using two standard benchmark problems using Omniglot and tiered Image Net datasets.
Researcher Affiliation	Industry	Max Vladymyrov EMAIL Google Research Andrey Zhmoginov EMAIL Google Research Mark Sandler EMAIL Google Research
Pseudocode	Yes	Algorithm 1 Class-incremental learning using Hyper Transformer with Prototypical Loss. Input: T randomly sampled K-way N-shot episodes: {S(t); Q(t)}T t=0. Output: The loss value J for the generated set of tasks.
Open Source Code	No	The paper does not provide an explicit statement about open-sourcing the code, nor does it include a link to a code repository.
Open Datasets	Yes	Most of our experiments were conducted using two standard benchmark problems using Omniglot and tiered Image Net datasets. ... We verify this by creating a multi-domain episode generator that includes tasks from various image datasets: Omniglot, Caltech101, Caltech Birds2011, Cars196, Oxford Flowers102 and Stanford Dogs.
Dataset Splits	No	The reported accuracy was calculated from 1024 random episodic evaluations from a separate test distribution, with each episode run 16 times with different combinations of input samples. ... We compare the performance of CHT to two baseline models. The first is a Constant Proto Net (Const PN), which represents a vanilla Prototypical Network, as described in Snell et al. (2017). In this approach, a universal fixed CNN network is trained on episodes from Ctrain. ... Finally, we can test the performance of the trained model aψ on episodes sampled from a holdout set of classes Ctest.
Hardware Specification	No	In all our experiments, we trained the network on a single GPU for 4M steps with SGD with an exponential LR decay over 100 000 steps with a decay rate of 0.97.
Software Dependencies	No	The paper mentions using 'SGD' as an optimizer and 'Transformer' and 'CNN' architectures but does not specify software dependencies with version numbers (e.g., Python, PyTorch versions).
Experiment Setup	Yes	In all our experiments, we trained the network on a single GPU for 4M steps with SGD with an exponential LR decay over 100 000 steps with a decay rate of 0.97. We noticed some stability issues when increasing the number of tasks and had to decrease the learning rate to compensate: for Omniglot experiments, we used a learning rate 10^-4 for up to 4 tasks and 5x10^-5 for 5 tasks. For tiered Image Net, we used the same learning rate of 5x10^-6 for training with any number of tasks T. ... The generated weights for each task θt are composed of four convolutional blocks and a single dense layer. Each of the convolutional blocks consist of a 3x3 convolutional layer, batch norm layer, ReLU activation and a 2x2 max-pooling layer. For Omniglot we used 8 filters for convolutional layers and 20-dim FC layer to demonstrate how the network works on small problems, and for tiered Image Net we used 64 filters for convolutional and 40-dim for the FC layer.