SMILE: Sample-to-feature Mixup for Efficient Transfer Learning

Authors: Xingjian Li, Haoyi Xiong, Cheng-zhong Xu, Dejing Dou

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments have been done to verify the performance improvement made by SMILE, in comparisons with a wide spectrum of transfer learning algorithms, including fine-tuning, L2-SP, DELTA, BSS, RIFLE, Co-Tuning and Reg SL, even with mixup strategies combined. Ablation studies show that the vanilla sample-to-label mixup strategies could marginally increase the linearity in-between training samples but lack of generalizability, while SMILE significantly improves the mixup effects in both label and feature spaces with both training and testing datasets. The empirical observations backup our design intuition and purposes.
Researcher Affiliation Collaboration Xingjian Li EMAIL Big Data Lab, Baidu Inc. State Key Lab of IOTSC, University of Macau; Haoyi Xiong EMAIL Big Data Lab, Baidu Inc.; Dejing Dou EMAIL Big Data Lab, Baidu Inc.; Chengzhong Xu EMAIL State Key Lab of IOTSC, University of Macau
Pseudocode Yes Algorithm 1: Deep Transfer Learning with SMILE
Open Source Code Yes Our code is available at https://github.com/lixingjian/SMILE.
Open Datasets Yes We conduct experiments on three popular object recognition datasets: CUB-200-2011 (Wah et al., 2011), Stanford Cars (Krause et al., 2013) and FGVC-Aircraft (Maji et al., 2013), which are intensively used in state-of-the-art transfer learning literatures (Chen et al., 2019; Li et al., 2020; You et al., 2020). Each of these datasets contains about 6k 8k training samples. We use Image Net (Deng et al., 2009) pre-trained Res Net-50 (He et al., 2016) as the source model. ... We use the Places365 (Zhou et al., 2017) pre-trained Res Net-50 to perform fine-tuning on MIT-Indoors-67 (Quattoni & Torralba, 2009)... large scale dataset Food-101 (Bossard et al., 2014)... fine-grained sentiment classification task SST-5... pre-trained model in this experiment is base model of BERT (Devlin et al., 2018)
Dataset Splits Yes For the first group, we first randomly select 25% of all the categories from each of these standard datasets. Then we randomly sample 400 and 800 training samples from the selected categories. For the second group, we use all categories, while evaluate with 15% or 100% training samples respectively, following the practice in existing baselines BSS (Chen et al., 2019) and Co-tuning (You et al., 2020). ... Each experiment is repeated five times and we report the average top-1 classification accuracy and standard derivations for uncertainty quantification.
Hardware Specification No No specific hardware details (like GPU/CPU models, processor types, or memory amounts) are provided for running the experiments. The text only describes the training process and models used.
Software Dependencies No The paper mentions optimizers like SGD and Adam, but does not provide specific software library names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup Yes We train all models using SGD with the momentum of 0.9, weight decay of 10 4 and batch size of 48. We train 15,000 iterations for Food-101 considering its large scale and 9,000 iterations for the remaining datasets. The initial learning rate is set to 0.001 for MIT-Indoor-67 due to its high similarity with the pre-trained dataset Places365 and 0.01 for the remaining. The learning rate is divided by 10 after two-thirds of total iterations. ... We fine-tune the pre-trained BERT model with the batch size to 24 for 3 epochs, using Adam optimizer with a learning rate of 2 10 5.