Learning an Explicit Hyper-parameter Prediction Function Conditioned on Tasks

Authors: Jun Shu, Deyu Meng, Zongben Xu

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We highlight the utility of our SLe M framework for obtaining the learning guarantees of some typical meta learning applications, including few-shot regression, few-shot classification and domain generalization. The theory-induced meta-regularization control effects of the meta-learner are empirically verified to be effective for consistently improving its generalization capability in some typical meta learning applications, including few-shot regression, few-shot classification and domain generalization. The source code of our method is released at https://github.com/xjtushujun/SLe M-Theory.
Researcher Affiliation Academia Jun Shu EMAIL School of Mathematics and Statistics and Ministry of Education Key Lab of Intelligent Networks and Network Security, Xi an Jiaotong University, Xi an, Shaan xi Province, P. R. China Pazhou Lab (Huangpu), Guangzhou, Guangdong Province, P. R. China Deyu Meng EMAIL School of Mathematics and Statistics and Ministry of Education Key Lab of Intelligent Networks and Network Security, Xi an Jiaotong University, Xi an, Shaan xi Province, P. R. China Pazhou Lab (Huangpu), Guangzhou, Guangdong Province, P. R. China Macau Institute of Systems Engineering, Macau University of Science and Technology Taipa, Macau, P. R. China. Zongben Xu EMAIL School of Mathematics and Statistics and Ministry of Education Key Lab of Intelligent Networks and Network Security, Xi an Jiaotong University, Xi an, Shaan xi Province, P. R. China Pazhou Lab (Huangpu), Guangzhou, Guangdong Province, P. R. China
Pseudocode Yes Algorithm 1 Meta-Training: Learning the Methodology... Algorithm 2 Meta-Test: Generalization to New Query Tasks... Algorithm 3 Online Methodology-Learning Algorithm: FTML
Open Source Code Yes The source code of our method is released at https://github.com/xjtushujun/SLe M-Theory.
Open Datasets Yes The CIFAR-FS dataset (Bertinetto et al., 2019) is a recently proposed few-shot image classification benchmark... The mini Image Net dataset (Vinyals et al., 2016) is a standard benchmark... The tiered Image Net benchmark (Ren et al., 2019) is a larger subset of ILSVRC-2012 (Russakovsky et al., 2015)... CUB dataset (Wah et al., 2011)... The PACS dataset is a recent object recognition benchmark for domain generalisation (Li et al., 2017a)... The Visual Decathlon (VD) dataset consists of ten well-known datasets from multiple visual domains (Rebuffiet al., 2017).
Dataset Splits Yes The CIFAR-FS dataset ... The classes are randomly split into 64, 16 and 20 for meta-training, meta-validation, and meta-testing, respectively. The FC100 dataset ... partitioned into 60 classes from 12 superclasses for meta-training, 20 classes from 4 superclasses for meta-validation, and 20 classes from 4 superclasses for meta-testing. The mini Image Net dataset ... The meta-training, meta-validation, and meta-testing sets contain 64, 16 and 20 classes randomly split from 100 classes, respectively. We use the commonly-used split proposed by (Ravi and Larochelle, 2017). The tiered Image Net benchmark ... These categories are then split into 3 disjoint sets: 20 categories for meta-training, 6 for meta-validation, and 8 for meta-test. This corresponds to 351, 97 and 160 classes for meta-training, meta-validation, and meta-testing, respectively.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It mentions computational resources as a general limitation but no specifications for the experimental setup.
Software Dependencies No The paper mentions that its implementation and baselines are based on code from GitHub repositories, and also states that "These minor differences are due to our used higher Pytorch version." However, it does not explicitly list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA x.x) for its own methodology, which is required for reproducibility.
Experiment Setup Yes The task-specific optimizer is set as Adam optimizer with learning rate 0.01, and the meta-optimizer is set as Adam optimizer with learning rate 0.001. We use SGD with Nesterov momentum of 0.9 and weight decay of 0.0005. Each mini-batch consists of 8 episodes. The model was meta-trained for 60 epochs, with each epoch consisting of 1000 episodes. The learning rate was initially set to 0.1, and then changed to 0.006, 0.0012, and 0.00024 at epochs 20, 40 and 50, respectively. We set the hyper-parameter λ as 0.1. ...trained with M-SGD optimizer (batch size/per meta-trian domain=32, batch size/per meta-test domain=16, lr=0.0005, weight decay=0.00005, momentum=0.9) for 45K iterations. ...trained with AMSGrad (batch-size/per meta-train domain=64, batch-size/per meta-test domain=32, lr=0.0005, weight decay=0.0001) for 30k iterations where the learning rate decayed in 5K, 12K, 15K, 20K iterations by a factor 5, 10, 50, 100, respectively.