Enhancing Logits Distillation with Plug&Play Kendall’s $τ$ Ranking Loss
Authors: Yuchen Guan, Runxi Cheng, Kang Liu, Chun Yuan
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on CIFAR-100, Image Net, and COCO datasets, as well as various CNN and Vi T teacher-student architecture combinations, demonstrate that our plug-and-play ranking loss consistently boosts the performance of multiple distillation baselines. |
| Researcher Affiliation | Academia | 1Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China. Correspondence to: Kang Liu <EMAIL>, Chun Yuan <EMAIL>. |
| Pseudocode | Yes | A.4. Algorithm Algorithm 1 Plug-and-Play Ranking Loss for Logit Distillation |
| Open Source Code | Yes | Code is available at https://github.com/OvernighTea/Ranking Loss-KD |
| Open Datasets | Yes | 1) CIFAR-100 (Krizhevsky et al., 2009) is a significant dataset for image classification, comprising 100 categories, with 50,000 training images and 10,000 test images. 2) Image Net (Russakovsky et al., 2015) is a largescale dataset utilized for image classification, comprising 1,000 categories, with approximately 1.28 million training images and 50,000 test images. 3) MS-COCO (Lin et al., 2014) is a mainstream dataset for object detection comprising 80 categories, with 118,000 training images and 5,000 test images. |
| Dataset Splits | Yes | 1) CIFAR-100 (Krizhevsky et al., 2009) is a significant dataset for image classification, comprising 100 categories, with 50,000 training images and 10,000 test images. 2) Image Net (Russakovsky et al., 2015) is a largescale dataset utilized for image classification, comprising 1,000 categories, with approximately 1.28 million training images and 50,000 test images. 3) MS-COCO (Lin et al., 2014) is a mainstream dataset for object detection comprising 80 categories, with 118,000 training images and 5,000 test images. |
| Hardware Specification | Yes | We utilize 1 NVIDIA Ge Force RTX 4090 to train models on CIFAR-100 and 4 NVIDIA Ge Force RTX 4090 for training on Image Net. The algorithm of our method can be found in Appendix A.4. We use a single RTX4090 for CIFAR-100 and 4 RTX4090 for Image Net. |
| Software Dependencies | No | We employ SGD (Sutskever et al., 2013) as the optimizer... We use the Adam W optimizer... |
| Experiment Setup | Yes | We set the batch size to 64 for CIFAR-100, 512 for Image Net and 8 for COCO. We employ SGD (Sutskever et al., 2013) as the optimizer, with the number of epochs and learning rate settings consistent with the comparative baselines. The hyper-parameters α, β in Eq. 6 are set to be the same as the compared baselines to maintain fairness, and γ are set equal to α. ... We use the Adam W optimizer and train for 300 epochs with an initial learning rate of 5e-4 and a weight decay of 0.05. The minimum learning rate is 5e-6, and the patch size is 16. We set α = 1, β = 1, γ = 0.5, and batch size is 128. |