reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SMILE: Sample-to-feature Mixup for Efficient Transfer Learning

Authors: Xingjian Li, Haoyi Xiong, Cheng-zhong Xu, Dejing Dou

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments have been done to verify the performance improvement made by SMILE, in comparisons with a wide spectrum of transfer learning algorithms, including fine-tuning, L2-SP, DELTA, BSS, RIFLE, Co-Tuning and Reg SL, even with mixup strategies combined. Ablation studies show that the vanilla sample-to-label mixup strategies could marginally increase the linearity in-between training samples but lack of generalizability, while SMILE significantly improves the mixup effects in both label and feature spaces with both training and testing datasets. The empirical observations backup our design intuition and purposes.
Researcher Affiliation	Collaboration	Xingjian Li EMAIL Big Data Lab, Baidu Inc. State Key Lab of IOTSC, University of Macau; Haoyi Xiong EMAIL Big Data Lab, Baidu Inc.; Dejing Dou EMAIL Big Data Lab, Baidu Inc.; Chengzhong Xu EMAIL State Key Lab of IOTSC, University of Macau
Pseudocode	Yes	Algorithm 1: Deep Transfer Learning with SMILE
Open Source Code	Yes	Our code is available at https://github.com/lixingjian/SMILE.
Open Datasets	Yes	We conduct experiments on three popular object recognition datasets: CUB-200-2011 (Wah et al., 2011), Stanford Cars (Krause et al., 2013) and FGVC-Aircraft (Maji et al., 2013), which are intensively used in state-of-the-art transfer learning literatures (Chen et al., 2019; Li et al., 2020; You et al., 2020). Each of these datasets contains about 6k 8k training samples. We use Image Net (Deng et al., 2009) pre-trained Res Net-50 (He et al., 2016) as the source model. ... We use the Places365 (Zhou et al., 2017) pre-trained Res Net-50 to perform fine-tuning on MIT-Indoors-67 (Quattoni & Torralba, 2009)... large scale dataset Food-101 (Bossard et al., 2014)... fine-grained sentiment classification task SST-5... pre-trained model in this experiment is base model of BERT (Devlin et al., 2018)
Dataset Splits	Yes	For the first group, we first randomly select 25% of all the categories from each of these standard datasets. Then we randomly sample 400 and 800 training samples from the selected categories. For the second group, we use all categories, while evaluate with 15% or 100% training samples respectively, following the practice in existing baselines BSS (Chen et al., 2019) and Co-tuning (You et al., 2020). ... Each experiment is repeated five times and we report the average top-1 classification accuracy and standard derivations for uncertainty quantification.
Hardware Specification	No	No specific hardware details (like GPU/CPU models, processor types, or memory amounts) are provided for running the experiments. The text only describes the training process and models used.
Software Dependencies	No	The paper mentions optimizers like SGD and Adam, but does not provide specific software library names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup	Yes	We train all models using SGD with the momentum of 0.9, weight decay of 10 4 and batch size of 48. We train 15,000 iterations for Food-101 considering its large scale and 9,000 iterations for the remaining datasets. The initial learning rate is set to 0.001 for MIT-Indoor-67 due to its high similarity with the pre-trained dataset Places365 and 0.01 for the remaining. The learning rate is divided by 10 after two-thirds of total iterations. ... We fine-tune the pre-trained BERT model with the batch size to 24 for 3 epochs, using Adam optimizer with a learning rate of 2 10 5.