Gap Minimization for Knowledge Sharing and Transfer

Authors: Boyu Wang, Jorge A. Mendez, Changjian Shui, Fan Zhou, Di Wu, Gezheng Xu, Christian Gagné, Eric Eaton

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive evaluation on both transfer learning and multitask learning benchmark data sets shows that our methods outperform existing baselines. The experimental results are reported in Section 6.
Researcher Affiliation Academia Boyu Wang EMAIL Department of Computer Science University of Western Ontario, Jorge A. Mendez EMAIL Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology, Changjian Shui EMAIL Institute Intelligence and Data Universit e Laval, Fan Zhou EMAIL School of Transportation Science and Engineering Beihang University, Di Wu EMAIL Department of Electrical and Computer Engineering Mc Gill University, Gezheng Xu EMAIL Department of Computer Science University of Western Ontario, Christian Gagn e EMAIL Institute Intelligence and Data Universit e Laval, Eric Eaton EMAIL Department of Computer and Information Science University of Pennsylvania
Pseudocode Yes Algorithm 1 gap Boost, Algorithm 2 gap Boost R, Algorithm 3 gap MTNN (one epoch)
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for their implementation.
Open Datasets Yes We evaluate gap Boost on two benchmark data sets for classification. The first data set we consider is 20 Newsgroups3... The second data set we use is Office-Caltech (Gong et al., 2012)... For gap Boost R, we evaluate it on five benchmark data sets: Concrete, Housing, Auto MPG, Diabetes, and Friedman. The first three data sets are from the UCI Machine Learning Repository4, and the Diabetes data set is from (Efron et al., 2004). Following (Pardoe and Stone, 2010), for each data set... In the Friedman data set (Friedman, 1991)... Next, we examined gap MTNN on four benchmark data sets: Digits (Shui et al., 2019), PACS (Li et al., 2017), Office-31 (Saenko et al., 2010), and Office-Home (Venkateswara et al., 2017).
Dataset Splits Yes For the Digits data set, we randomly select 3K, 5K and 8K instances for training... For PACS data set, we randomly selected 10%, 15% and 20% of the total training data for training... For the Office-31 and Office-Home data sets, we adopted the same train/validation/test splits strategy of (Long et al., 2017; Zhou et al., 2021a). We randomly select 5%, 10%, 20% of the instances from the training set to train the model. The validation set is used for developing the model while we test on the rest test data set... For each data set, we used all source data and a small amount of target data (10% on 20 Newsgroups and 10 points on Office-Caltech) as training data, and used the rest of the target data for testing. We repeated all experiments over 20 different random train/test splits...
Hardware Specification Yes All the experiments were repeated 10 times on Intel Gold 6148 Skylake 2.4 GHz CPU and 2 x NVidia V100SXM2 (16G memory) GPU.
Software Dependencies Yes The main software packages used for the experiments are Pytorch version 1.0, Torchvision Version 0.2, Python version 3.6, and Python CVXPY package 1.0.
Experiment Setup Yes The number of boosting iterations is set to 20. The hyper-parameters of gap Boost were set as γmax = 1/NT as per Remark 39, ρT = 0, which corresponds to no punishment for the target data, and ρS = log(1/2)... We trained the networks using the Adam optimizer, with an initial learning rate of 2e-4, decaying by 5% every 5 epochs, for a total of 120 epochs. Additional details on the experimental implementation can be found in Appendix C... For Office-31 we use a training batch size of 16 and for the Office-Home data set we use a training batch size of 24. We adopt the Adam optimizer to train the model for the three data sets. The initial learning rate was set to 2e-4. During the training process, we decay the learning rate 5% for every 5 epochs. To enforce a L2 regularization, we also enable the weight decay option in Adam provided by Pytorch. The relation coefficients α are initialized by 1/#tasks for each class, then it is optimized dynamically as the training goes on. We set a weight of 0.1 to regularize the semantic loss and the marginal alignment objective.