k-NN as a Simple and Effective Estimator of Transferability

Authors: Moein Sorkhei, Christos Matsoukas, Johan Fredin Haslum, Emir Konuk, Kevin Smith

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted an extensive evaluation involving over 42,000 experiments comparing 23 transferability metrics across 16 different datasets to assess their ability to predict transfer performance for image classification tasks.
Researcher Affiliation Academia Moein Sorkhei EMAIL KTH Royal Institute of Technology, Stockholm, Sweden Science for Life Laboratory, Stockholm, Sweden
Pseudocode No The paper describes mathematical formulations for various metrics (NCE, LEEP, N-LEEP, GBC, FID, EMD, IDS) but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions using PyTorch for training models but does not provide a link to their own source code, nor does it explicitly state that their code will be released or is available in supplementary materials.
Open Datasets Yes We apply transfer learning across a diverse set of 16 image classification datasets. For the source domains, we selected Image Net (Deng et al., 2009), i Nat2017 (Van Horn et al., 2018), Places365 (Zhou et al., 2017), and NABirds (Van Horn et al., 2015). As target datasets, we include well-known benchmarks such as CIFAR-10 and CIFAR-100 Krizhevsky et al. (2009), Caltech-101 (Fei-Fei et al., 2004), Caltech-256 (Griffin et al., 2007), Stanford Dogs (Khosla et al., 2011), Aircraft (Maji et al., 2013), NABirds Van Horn et al. (2015), Oxford-III Pet (Parkhi et al., 2012)), SUN397 (Xiao et al., 2010), DTD (Cimpoi et al., 2014), AID (Xia et al., 2017), and APTOS2019 Karthik (2019).
Dataset Splits Yes For each dataset, either the official train/val/test splits were used, or we made the splits following (Kornblith et al., 2019). ... Specifically, we split the training set (S) of the target in two disjoint subsets S1, S2 of 80%-20% of the training set. Subsequently, k-NN classification was performed on S2 using k nearest neighbors from S1. The resulting k-NN accuracy served as the transferability score (to ensure reliability, we repeated the same procedure with 3-fold cross-validation on the training set, yielding identical results).
Hardware Specification No We acknowledge the Berzelius computational resources provided by the Knut and Alice Wallenberg Foundation at the National Supercomputer Centre and the the computational resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725. This mentions general computational resources and centers but lacks specific hardware details such as GPU/CPU models or memory specifications.
Software Dependencies Yes The Adam optimizer (Kingma & Ba, 2014) was used for CNNs and Adam W (Loshchilov & Hutter, 2017) for Vi T-based architectures, and the training of models was done using Py Torch (Paszke et al., 2019).
Experiment Setup Yes Images were normalized and resized to 256 256, after which augmentations were applied: random color jittering, random horizontal flip and random cropping to 224 224 of the rescaled image. The Adam optimizer (Kingma & Ba, 2014) was used for CNNs and Adam W (Loshchilov & Hutter, 2017) for Vi T-based architectures... After a grid search, the pretrained and the randomly-initialized models were trained with a learning rate of 10 4 and 3 10 4 respectively, following an initial warm-up for 1,000 iterations. During training, the learning rate was dropped by a factor of 10 whenever the training saturated until it reached a final learning rate of 10 6 or 3 10 6 for pre-trained or randomly-initialized models respectively. The checkpoint with the highest validation performance was finally chosen for final evaluation.