reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Unsupervised Meta-Learning via In-Context Learning

Authors: Anna Vettoruzzo, Lorenzo Braccaioli, Joaquin Vanschoren, Marlena Nowaczyk

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on benchmark datasets showcase the superiority of our approach over existing unsupervised meta-learning baselines, establishing it as the new state-of-the-art. Remarkably, our method achieves competitive results with supervised and self-supervised approaches, underscoring its efficacy in leveraging generalization over memorization. Throughout extensive experiments we demonstrate the effectiveness of the proposed approach to generalize to new tasks in real-time. Particularly, CAMe LU outperforms other UML baselines across several datasets, establishing itself as the state-of-the-art in the field. It also achieves comparable results to its supervised counterpart and to SSL approaches.
Researcher Affiliation	Academia	Anna Vettoruzzo Halmstad University, Sweden EMAIL Lorenzo Braccaioli University of Trento, Italy EMAIL Joaquin Vanschoren Eindhoven University of Technology, Netherlands EMAIL Marlena Nowaczyk Halmstad University, Sweden EMAIL
Pseudocode	No	The paper describes the proposed approach and task creation mechanism in text (Sections 3 and 3.1) and provides a visualization in Figure 1, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' block with structured steps.
Open Source Code	Yes	The complete codebase, models, and pre-trained weights are available on Git Hub 1 to facilitate further research and replication. 1https://github.com/bracca95/CAMe LU.git
Open Datasets	Yes	For the evaluation, we use two generic object recognition datasets, i.e., mini Image Net (Ravi & Larochelle, 2016) and CIFAR-fs (Bertinetto et al., 2019), and three fine-grained image classification datasets, i.e., CUB (Wah et al., 2011), Aircraft (Maji et al., 2013), and Meta-i Nat (Wertheimer & Hariharan, 2019). ... For training CAMe LU, we use Image Net-964, which is a variant of the original Image Net-1k dataset (Deng et al., 2009)... When a multidataset approach is utilized for training CAMe LU (see Appendix A.3), MSCOCO (Lin et al., 2014) and Fungi (Schroeder & Cui, 2018) are loaded into the program and used together with Image Net964 for creating the whole training dataset.
Dataset Splits	Yes	Each dataset is split into training, validation, and test sets following the splits in Ravi & Larochelle (2016) and Bertinetto et al. (2019) for mini Image Net and CIFAR-fs, respectively, and in Triantafillou et al. (2019) and Poulakakis-Daktylidis & Jamali-Rad (2024) for the remaining datasets. All labels are removed from the datasets during the training phase. ... mini Image Net is split into train/validation/test using the splits proposed in Ravi & Larochelle (2016), resulting in 38 400 images for training, 9600 for validation, and 12 000 for testing. The same number of images are also present in CIFAR-fs, and the splits follow the work in Bertinetto et al. (2019). CUB and Aircraft, instead, are two fine-grained datasets with a smaller size compared to the others. CUB (Wah et al., 2011) consists of 8239 in the training set, 1779 in the validation set, and 1770 in the test set, while Aircraft has respectively 7000/1500/1500 images in the train/validation/test sets (Triantafillou et al., 2019).
Hardware Specification	Yes	The experiments are executed using Python and the Py Torch library on an Nvidia Ge Force RTX 3070 Ti Laptop GPU with 8GB of VRAM, while ablation studies and competitors are executed on an Nvidia A100-SXM4 GPU with 40GB of VRAM.
Software Dependencies	No	The paper mentions 'Python and the Py Torch library' for execution and references 'Hugging Face website' and 'Py Torch Lightning Bolts' for some components, but it does not specify version numbers for any of these software dependencies.
Experiment Setup	Yes	All models are trained for 100 epochs with 500 episodes per epoch. ... For CAMe LU, we use a Res Net-50 (He et al., 2016) feature extractor pretrained on Image Net-964 and a class encoder that maps one-hot label vectors to a 256-dimensional space. ... The transformer encoder consists of 8 layers, each with an eight-head self-attention block, an MLP, and a single projection layer that maps the transformer output to the predicted category. The model is trained with the Adam optimizer with a learning rate of 10 5 and a warmup cosine scheduler (Vaswani et al., 2017). ... The training of CAMe LU is performed for 100 epochs, with 500 episodes each, using the Adam optimizer with an initial learning rate of 10 5 and a warmup cosine scheduler with 1500 warmup steps and a final learning rate of 10 6.