Asymmetric Decision-Making in Online Knowledge Distillation: Unifying Consensus and Divergence

Authors: Zhaowei Chen, Borui Zhao, Yuchen Ge, Yuhao Chen, Renjie Song, Jiajun Liang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that ADM consistently surpasses existing OKD methods across various online knowledge distillation settings and also achieves superior results when applied to offline knowledge distillation, semantic segmentation and diffusion distillation tasks. Comprehensive experiments are conducted to verify our effectiveness, including online KD, offline KD, semantic segmentation and diffusion distillation. It is shown that the proposed method consistently outperforms state-of-the-art OKD methods. Results on CIFAR-100. Results on Image Net. Experiments are also conducted on semantic segmentation.
Researcher Affiliation Collaboration 1JIIOV Technology 2University of Southern California. Correspondence to: Zhaowei Chen <EMAIL>, Jiajun Liang <EMAIL>.
Pseudocode No The paper describes the methodology using mathematical formulations and descriptive text, such as equations (1) through (10) and detailed explanations in Section 4.1 'Asymmetric Decision-Making in Online Knowledge Distillation'. However, it does not include a clearly labeled pseudocode block or algorithm steps formatted like code.
Open Source Code No The paper does not contain any explicit statement about the release of source code, a direct link to a code repository, or mention that code is provided in supplementary materials.
Open Datasets Yes Datasets. We validated ADM on the following datasets: CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), Image Net (Russakovsky et al., 2015), Cityscapes (Cordts et al., 2016).
Dataset Splits Yes CIFAR-100 (Krizhevsky et al., 2009) consists of 32 32 color images drawn from 100 classes, which are split into 50k train and 10k test images. Image Net (Russakovsky et al., 2015) is a large scale image classificaiton dataset that contains 1k classes with about 1.28 million training images and 50k images for validation. Cityscapes (Cordts et al., 2016) is a dataset for urban scene parsing, comprising 5000 meticulously annotated images. The distribution of these images is as follows: 2975 for training, 500 for validation, and 1525 for testing.
Hardware Specification No The paper mentions FLOPs for ResNet-34 and ResNet-18, implying computational resources were used, but it does not specify any particular hardware components such as GPU models, CPU types, or memory configurations used for running the experiments. For example, 'Res Net34 has 3.67G FLOPs and Res Net18 has 1.82G FLOPs. ADM only adds 0.52M FLOPs.' This refers to model complexity, not execution hardware.
Software Dependencies No The paper does not provide specific software dependencies with version numbers, such as Python versions, deep learning frameworks (e.g., PyTorch, TensorFlow) with their versions, or specific library versions used for implementation.
Experiment Setup Yes Implementation Details. Following the settings of previous methods (Qian et al., 2022; Tian et al., 2019), the batch size, epochs, learning rate decay rate, and weight decay rate are 256/128, 100/300, 0.1/0.1, and 0.0001/0.0005, respectively on Image Net/CIFAR-100. The initial learning rate is 0.1 on Image Net, and 0.01 for Mobile Net V2, 0.1 for the other students on CIFAR-100. Besides, the learning rate drops every 30 epochs on Image Net and drops at 140, 200, 250 epochs on CIFAR-100. The optimizer is Stochastic Gradient Descent (SGD) with momentum 0.9. By following the conventions in OKD, we use a fixed temperature T as 1.0 and loss weight λ as 1.0 for all experiments. We set α = 0.01, β = 0.01, γ = 1.0 on CIFAR-100 dataset because of limited spatial information, and α = 0.2, β = 0.6, γ = 0.01 on Image Net dataset for all OKD experiments.