Meta-Learning under Task Shift
Authors: Lei Sun, Yusuke Tanaka, Tomoharu Iwata
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation of few-shot classification datasets demonstrates a significant improvement of IWML over existing approaches. In Section 5, the paper details "5 Experimental Evaluation" including datasets, comparative methods, experimental settings, and results with accuracy tables and ablation studies. |
| Researcher Affiliation | Collaboration | Lei Sun EMAIL Nara Institute of Science and Technology, Yusuke Tanaka EMAIL NTT Communication Science Laboratories, Tomoharu Iwata EMAIL NTT Communication Science Laboratories. The affiliations include both an academic institution (Nara Institute of Science and Technology) and an industrial research laboratory (NTT Communication Science Laboratories). |
| Pseudocode | Yes | The paper includes "Algorithm 1 Importance estimation procedure of IWML" and "Algorithm 2 Meta-training procedure of IWML" on page 6. |
| Open Source Code | No | The paper does not provide an explicit statement about open-source code release or a direct link to a code repository. The Open Review link provided is for paper review, not code. |
| Open Datasets | Yes | We evaluated our proposed method using three datasets: mini Image Net, Omniglot, and tiered Image Net. The mini Image Net dataset, first introduced by Vinyals (Vinyals et al., 2016), is a subset of the larger ILSVRC-12 dataset (Russakovsky et al., 2015). Omniglot dataset (Lake et al., 2011) includes 1,623 unique, hand-drawn characters from 50 different alphabets. The tiered Image Net dataset, initially proposed by (Ren et al., 2018). |
| Dataset Splits | Yes | For both mini Image Net and Omniglot, we simulated a task shift scenario by splitting all classes within each dataset into two clusters based on MMD distance... The meta-training-validation datasets consisted of all classes within cluster A and randomly selected 10% of the classes within cluster B. The meta-training datasets comprised 70% of the classes from the meta-training-validation datasets... The meta-validation datasets consisted of the remaining 30%... The meta-test datasets was composed of 45%... The unlabeled datasets for importance estimation was made up of the remaining 45%... For tiered Image Net... we followed the original paper s (Ren et al., 2018) method to split the training, validation, and test datasets. The unlabeled datasets used for importance estimation consisted of 40 classes randomly selected from the test dataset without replacement. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Adam (Kingma & Ba, 2014) as an optimizer and various neural network components (convolutional blocks, batch normalization, ReLU, softmax, cross-entropy loss), but it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used. |
| Experiment Setup | Yes | Our network architecture, closely mirroring the one used in the referenced study (Vinyals et al., 2016), comprises four convolutional blocks. Each block consists of a 3 3 convolution with 128 filters, a subsequent batch normalization layer (Ioffe & Szegedy, 2015), a Re LU nonlinearity, and finally a 2 2 max-pooling layer... We carry out a single gradient update with a fixed step size α = 0.05 in the inner loop, while Adam (Kingma & Ba, 2014) serves as the outer loop optimizer with step size β = 10 4... During the meta-training phase, we trained the proposed IWML and the comparative methods using 250,000 meta-training tasks with a meta-batch size of 50. ... We employed 30 validation tasks for early stopping. ... each task was constructed using a 5-way setting with 1-shot, 2-shot, or 3-shot configurations for mini Image Net and tiered Image Net, and 20-way with the same shot variations for the Omniglot datasets. ... For smoothing scalar parameters σ and h ... its value for each task is determined by calculating the median of { gϕ(xn) gϕ(xn ) 2}N n =1. Similarly, for h, it is determined using the median of {d(q(x), q TR t (x))}T TR t =1. |