reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Comprehensive Survey on Knowledge Distillation

Authors: Amir M. Mansourian, Rozhan Ahmadi, Masoud Ghafouri, Amir Mohammad Babaei, Elaheh Badali Golezani, Zeynab yasamani ghamchi, Vida Ramezanian, Alireza Taherian, Kimia Dinashi, Amirali Miri, Shohreh Kasaei

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, existing methods are compared for two well-known tasks: classification and semantic segmentation segmentation, as almost all the proposed distillation methods evaluate their results on these two tasks. For classification, the Image Net and CIFAR-100 datasets, and for semantic segmentation, the Cityscapes and Pascal VOC datasets are used, as these two datasets are widely employed in existing methods. ... Tables 12, 13, 14, 15, and 16 report the performance of various distillation methods across different domains, tasks, and datasets. Where available, we also include results for cases where the teacher and student models have either similar or different architectures. This information enables meaningful comparison, as it demonstrates the impact of varying dataset sizes and the robustness of each method to architectural differences.
Researcher Affiliation	Academia	Amir M. Mansourian EMAIL Sharif University of Technology Rozhan Ahmadi EMAIL Sharif University of Technology Masoud Ghafouri EMAIL Sharif University of Technology Amir Mohammad Babaei EMAIL Sharif University of Technology Elaheh Badali Golezani EMAIL Sharif University of Technology Zeynab Yasamani Ghamchi EMAIL Sharif University of Technology Vida Ramezanian EMAIL Sharif University of Technology Alireza Taherian EMAIL Sharif University of Technology Kimia Dinashi EMAIL Sharif University of Technology Amirali Miri, EMAIL Sharif University of Technology Shohreh Kasaei EMAIL Sharif University of Technology
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks. It describes methods conceptually using natural language and mathematical formulations (e.g., equations 1-7) along with diagrams (e.g., Figure 1, Figure 2, Figure 3), but no step-by-step algorithmic procedures.
Open Source Code	Yes	Github page of the project: https://github.com/IPL-Sharif/KD_Survey
Open Datasets	Yes	For classification, the Image Net and CIFAR-100 datasets, and for semantic segmentation, the Cityscapes and Pascal VOC datasets are used, as these two datasets are widely employed in existing methods. ... The datasets used are: GLUE, Multilingual NER, Dolly Eval, IMDB, GSM8K, Cross Fit, Commonsense QA, SVAMP, Universal NER Benchmark, Open Book QA, BBH, Strategy QA, and MATH.
Dataset Splits	Yes	Image Net is a large-scale dataset designed for visual object recognition tasks, consisting of over 1.2 million training images, 50,000 validation images, and 100,000 testing images across 1,000 object classes. CIFAR-100 is composed of 32 × 32 images taken from 100 classes, with 50,000/10,000 images for training/testing... Cityscapes is designed for understanding urban scenes and includes 2,975/500/1,525 images for training/validation/testing in 19 classes. Pascal VOC dataset comprises 1464/1449/1456 images for train/val/test, with 21 classes.
Hardware Specification	No	The paper acknowledges the role of hardware in experimental results by stating: "These factors are highly dependent on the hardware resources employed in each method, which is the determining factor in the final results." However, it does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or memory) used by the authors for the work presented in this survey paper.
Software Dependencies	No	The paper is a comprehensive survey and does not describe a new methodology implemented by the authors that would require specific software dependencies with version numbers. While it references various software and models (e.g., BERT, GPT-3) used in other research, it does not specify any software versions utilized by the authors for their own work.
Experiment Setup	No	As a survey paper, the document reviews and synthesizes information from existing research. It does not describe an experimental setup, hyperparameters, model initialization, or training configurations for any new experiments conducted by the authors in this paper.