Neural Collapse Inspired Knowledge Distillation

Authors: Shuoxi Zhang, Zijian Song, Kun He

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments demonstrate that NCKD is simple yet effective, improving the generalization of all distilled student models and achieving state-of-the-art accuracy performance. We conduct extensive experiments to evaluate the effectiveness of NCKD across various benchmarks. Our method not only outperforms state-of-the-art distillation techniques on multiple vision tasks but also demonstrates its versatility as a plug-and-play loss that can be integrated into other popular distillation methods to enhance their performance. Experiments Baselines Main Results CIFAR-100. Image Net-1k. MS-COCO. Extensions Visualization Ablation Study
Researcher Affiliation Academia School of Computer Science and Technology, Huazhong University of Science and Technology EMAIL
Pseudocode No The paper describes the proposed method using equations and textual explanations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The detailed implementation of the experiments is provided in the appendix, available at https://arxiv.org/abs/2412.11788. This link refers to the paper itself on arXiv, not a separate code repository, and the text does not explicitly state that source code is released.
Open Datasets Yes We conduct extensive experiments to evaluate the effectiveness of NCKD across various benchmarks. Our method not only outperforms state-of-the-art distillation techniques on multiple vision tasks but also demonstrates its versatility as a plug-and-play loss that can be integrated into other popular distillation methods to enhance their performance. Main Results CIFAR-100. To validate the effectiveness of our approach, we compared NCKD against a range of state-of-the-art distillation methods. To validate the effectiveness of our method on large-scale vision tasks, we conducted experiments on the Image Net-1k dataset. We verify the efficacy of the proposed NC-inspired loss in knowledge distillation tasks for object detection on the COCO dataset.
Dataset Splits Yes Our experiments included both similar-architecture and cross-architecture distillation to demonstrate the universality of our method. As shown in Table 1, NCKD outperformed all existing baselines, achieving an average accuracy of 75.10%. Additionally, when we integrated our NC-inspired losses as a plug-in module into two mainstream methods, CRD and Sim KD, we observed a significant improvement in distillation performance. These results confirm the effectiveness of our approach in enhancing distillation generalization and highlight its versatility as a plug-and-play module suitable for various distillation frameworks and real-world applications. Image Net-1k. To validate the effectiveness of our method on large-scale vision tasks, we conducted experiments on the Image Net-1k dataset, using both similararchitecture (Res Net34/Res Net18) and cross-architecture (Res Net50/Mobile Net) network pairs. MS-COCO. We verify the efficacy of the proposed NC-inspired loss in knowledge distillation tasks for object detection on the COCO dataset.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory.
Software Dependencies No The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks.
Experiment Setup No The detailed implementation of the experiments is provided in the appendix, available at https://arxiv.org/abs/2412.11788. The main text of the paper does not contain specific experimental setup details like hyperparameter values or training configurations; these details are deferred to the appendix.