Can Students Beyond the Teacher? Distilling Knowledge from Teacher’s Bias

Authors: Jianhua Zhang, Yi Gao, Ruyu Liu, Xu Cheng, Houxiang Zhang, Shengyong Chen

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that our strategy, as a plug-and-play module, is versatile across various mainstream KD frameworks. We conducted experiments on three classification datasets, CIFAR-10(Krizhevsky, Hinton et al. 2009), CIFAR100(Krizhevsky, Hinton et al. 2009), and Image Net 1K(Russakovsky et al. 2015), as well as on an object detection dataset MS-COCO(Lin et al. 2014).
Researcher Affiliation Academia 1School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, 300384, China 2School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, China 3the Department of Technology, Management and Economics, Technical University of Denmark, Lyngby, Denmark 4Norwegian University of Science and Technology EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology using natural language, mathematical equations, and diagrams (Figure 1, 2, 3), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/smartyige/BTKD
Open Datasets Yes We conducted experiments on three classification datasets, CIFAR-10(Krizhevsky, Hinton et al. 2009), CIFAR100(Krizhevsky, Hinton et al. 2009), and Image Net 1K(Russakovsky et al. 2015), as well as on an object detection dataset MS-COCO(Lin et al. 2014).
Dataset Splits Yes We conducted experiments on three classification datasets, CIFAR-10(Krizhevsky, Hinton et al. 2009), CIFAR100(Krizhevsky, Hinton et al. 2009), and Image Net 1K(Russakovsky et al. 2015), as well as on an object detection dataset MS-COCO(Lin et al. 2014). top-1 and top-5 are standards for calculating classification accuracy on the validation set. Results on MS-COCO based on Faster-RCNN (Ren et al. 2015)-FPN(Lin et al. 2017): AP evaluated on val2017.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, memory) used to conduct the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) that would be needed to replicate the experiments.
Experiment Setup No We have the following dynamic adjustment coefficient as γ = e / E . e represents the current training iteration, and E denotes the total training iterations. In order to shorten the training cycles, we employed a method for dynamically adjusting the student s learning focus.