SMILE: Sample-to-feature Mixup for Efficient Transfer Learning
Authors: Xingjian Li, Haoyi Xiong, Cheng-zhong Xu, Dejing Dou
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments have been done to verify the performance improvement made by SMILE, in comparisons with a wide spectrum of transfer learning algorithms, including fine-tuning, L2-SP, DELTA, BSS, RIFLE, Co-Tuning and Reg SL, even with mixup strategies combined. Ablation studies show that the vanilla sample-to-label mixup strategies could marginally increase the linearity in-between training samples but lack of generalizability, while SMILE significantly improves the mixup effects in both label and feature spaces with both training and testing datasets. The empirical observations backup our design intuition and purposes. |
| Researcher Affiliation | Collaboration | Xingjian Li EMAIL Big Data Lab, Baidu Inc. State Key Lab of IOTSC, University of Macau; Haoyi Xiong EMAIL Big Data Lab, Baidu Inc.; Dejing Dou EMAIL Big Data Lab, Baidu Inc.; Chengzhong Xu EMAIL State Key Lab of IOTSC, University of Macau |
| Pseudocode | Yes | Algorithm 1: Deep Transfer Learning with SMILE |
| Open Source Code | Yes | Our code is available at https://github.com/lixingjian/SMILE. |
| Open Datasets | Yes | We conduct experiments on three popular object recognition datasets: CUB-200-2011 (Wah et al., 2011), Stanford Cars (Krause et al., 2013) and FGVC-Aircraft (Maji et al., 2013), which are intensively used in state-of-the-art transfer learning literatures (Chen et al., 2019; Li et al., 2020; You et al., 2020). Each of these datasets contains about 6k 8k training samples. We use Image Net (Deng et al., 2009) pre-trained Res Net-50 (He et al., 2016) as the source model. ... We use the Places365 (Zhou et al., 2017) pre-trained Res Net-50 to perform fine-tuning on MIT-Indoors-67 (Quattoni & Torralba, 2009)... large scale dataset Food-101 (Bossard et al., 2014)... fine-grained sentiment classification task SST-5... pre-trained model in this experiment is base model of BERT (Devlin et al., 2018) |
| Dataset Splits | Yes | For the first group, we first randomly select 25% of all the categories from each of these standard datasets. Then we randomly sample 400 and 800 training samples from the selected categories. For the second group, we use all categories, while evaluate with 15% or 100% training samples respectively, following the practice in existing baselines BSS (Chen et al., 2019) and Co-tuning (You et al., 2020). ... Each experiment is repeated five times and we report the average top-1 classification accuracy and standard derivations for uncertainty quantification. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, processor types, or memory amounts) are provided for running the experiments. The text only describes the training process and models used. |
| Software Dependencies | No | The paper mentions optimizers like SGD and Adam, but does not provide specific software library names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x). |
| Experiment Setup | Yes | We train all models using SGD with the momentum of 0.9, weight decay of 10 4 and batch size of 48. We train 15,000 iterations for Food-101 considering its large scale and 9,000 iterations for the remaining datasets. The initial learning rate is set to 0.001 for MIT-Indoor-67 due to its high similarity with the pre-trained dataset Places365 and 0.01 for the remaining. The learning rate is divided by 10 after two-thirds of total iterations. ... We fine-tune the pre-trained BERT model with the batch size to 24 for 3 epochs, using Adam optimizer with a learning rate of 2 10 5. |