Neural Collapse Inspired Knowledge Distillation
Authors: Shuoxi Zhang, Zijian Song, Kun He
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments demonstrate that NCKD is simple yet effective, improving the generalization of all distilled student models and achieving state-of-the-art accuracy performance. We conduct extensive experiments to evaluate the effectiveness of NCKD across various benchmarks. Our method not only outperforms state-of-the-art distillation techniques on multiple vision tasks but also demonstrates its versatility as a plug-and-play loss that can be integrated into other popular distillation methods to enhance their performance. Experiments Baselines Main Results CIFAR-100. Image Net-1k. MS-COCO. Extensions Visualization Ablation Study |
| Researcher Affiliation | Academia | School of Computer Science and Technology, Huazhong University of Science and Technology EMAIL |
| Pseudocode | No | The paper describes the proposed method using equations and textual explanations, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The detailed implementation of the experiments is provided in the appendix, available at https://arxiv.org/abs/2412.11788. This link refers to the paper itself on arXiv, not a separate code repository, and the text does not explicitly state that source code is released. |
| Open Datasets | Yes | We conduct extensive experiments to evaluate the effectiveness of NCKD across various benchmarks. Our method not only outperforms state-of-the-art distillation techniques on multiple vision tasks but also demonstrates its versatility as a plug-and-play loss that can be integrated into other popular distillation methods to enhance their performance. Main Results CIFAR-100. To validate the effectiveness of our approach, we compared NCKD against a range of state-of-the-art distillation methods. To validate the effectiveness of our method on large-scale vision tasks, we conducted experiments on the Image Net-1k dataset. We verify the efficacy of the proposed NC-inspired loss in knowledge distillation tasks for object detection on the COCO dataset. |
| Dataset Splits | Yes | Our experiments included both similar-architecture and cross-architecture distillation to demonstrate the universality of our method. As shown in Table 1, NCKD outperformed all existing baselines, achieving an average accuracy of 75.10%. Additionally, when we integrated our NC-inspired losses as a plug-in module into two mainstream methods, CRD and Sim KD, we observed a significant improvement in distillation performance. These results confirm the effectiveness of our approach in enhancing distillation generalization and highlight its versatility as a plug-and-play module suitable for various distillation frameworks and real-world applications. Image Net-1k. To validate the effectiveness of our method on large-scale vision tasks, we conducted experiments on the Image Net-1k dataset, using both similararchitecture (Res Net34/Res Net18) and cross-architecture (Res Net50/Mobile Net) network pairs. MS-COCO. We verify the efficacy of the proposed NC-inspired loss in knowledge distillation tasks for object detection on the COCO dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks. |
| Experiment Setup | No | The detailed implementation of the experiments is provided in the appendix, available at https://arxiv.org/abs/2412.11788. The main text of the paper does not contain specific experimental setup details like hyperparameter values or training configurations; these details are deferred to the appendix. |