reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Simplifying Knowledge Transfer in Pretrained Models

Authors: Siddharth Jain, Shyamgopal Karthik, Vineet Gandhi

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments across various tasks demonstrate the effectiveness of our proposed approach. In image classification, we improved the performance of Vi T-B by approximately 1.4% through bidirectional knowledge transfer with Vi T-T. For semantic segmentation, our method boosted all evaluation metrics by enabling knowledge transfer both within and across backbone architectures. In video saliency prediction, our approach achieved a new state-of-the-art.
Researcher Affiliation	Academia	Siddharth Jain EMAIL Center for Visual Information Technology International Institute of Information Technology, Hyderabad Shyamgopal Karthik EMAIL University of Tübingen Vineet Gandhi EMAIL Center for Visual Information Technology International Institute of Information Technology, Hyderabad
Pseudocode	Yes	Algorithm 1: Bi-KD Input: Training set X, label set Y, learning rate η, epochs Tmax, iterations Nmax, models f1 and f2 parameterized by θ1 and θ2 respectively
Open Source Code	Yes	The code is available at: https://github.com/Syd-J/Bi-KD
Open Datasets	Yes	Image Net (Deng et al., 2009) consists of 1.2 million images for training and 50,000 images for validation. We report the results of our knowledge transfer between two or multiple models on the validation set. ADE20K (Zhou et al., 2017) provides 150 object and stuff categories, with 20,210 images in the training set and 2,000 images in the validation set. We use the validation set to evaluate our approach for knowledge transfer on semantic segmentation. DHF1K (Wang et al., 2018) is a benchmark dataset for video saliency prediction, comprising 600 videos in the training set and 100 videos in the validation set. We use the validation set for our evaluation. Hollywood-2 (Mathe & Sminchisescu, 2014) is the largest dataset for video saliency prediction in terms of the number of videos, containing 1,707 clips sourced from 69 Hollywood movies.
Dataset Splits	Yes	Image Net (Deng et al., 2009) consists of 1.2 million images for training and 50,000 images for validation. ADE20K (Zhou et al., 2017) provides 150 object and stuff categories, with 20,210 images in the training set and 2,000 images in the validation set. DHF1K (Wang et al., 2018) is a benchmark dataset for video saliency prediction, comprising 600 videos in the training set and 100 videos in the validation set. Hollywood-2 (Mathe & Sminchisescu, 2014)... we use the predefined split of 823 videos for training and the remaining 884 videos for testing.
Hardware Specification	Yes	We implement all the networks and training procedures in Pytorch (Paszke et al., 2019), and conduct all experiments on a single NVIDIA RTX A6000.
Software Dependencies	No	We implement all the networks and training procedures in Pytorch (Paszke et al., 2019), and conduct all experiments on a single NVIDIA RTX A6000. ... The only exceptions are Vi Ts, for which we employ the default data augmentations and cosine scheduler provided by the timm (Wightman et al., 2019) library. Explanation: The paper mentions 'Pytorch' and 'timm library' with citations, but does not specify exact version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	We use the Adam optimizer for image classification and video saliency prediction, while experiments on semantic segmentation utilize the Adam W optimizer. For all experiments, the learning rate and weight decay are set to 1e-6 and 1e-5 respectively, with the temperature parameter set to 1 in Equation 1. All models are trained in full precision for 20 epochs with a batch size of 128. We do not apply any data augmentations, learning rate schedulers, or layer-wise learning rate decay.