Maintaining Fairness in Logit-based Knowledge Distillation for Class-Incremental Learning

Authors: Zijian Gao, Shanhao Han, Xingxing Zhang, Kele Xu, Dulan Zhou, Xinjun Mao, Yong Dou, Huaimin Wang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We critically reassess the overlooked sub-optimality of vanilla KD through comprehensive empirical evaluations and analyses, revealing the conflict between learning and anti-forgetting caused by the neglected interplay and rigid exact match-based KD.
Researcher Affiliation Academia 1College of Computer Science and Technology, National University of Defense Technology, Changsha, China. 2State Key Laboratory of Complex & Critical Software Environment, Changsha, China. 3School of Computer Science, Tsinghua University, Beijing, China.
Pseudocode Yes Figure 4 illustrates the schematic diagram of our method and Algorithm 1 provides the pseudo-code of our method in a Py Torch-like (Paszke et al. 2019).
Open Source Code Yes Code https://github.com/Zi-Jian-Gao/Maintaining Fairness-in-LKD-for-CIL
Open Datasets Yes We conducted experiments in both train from scratch and train from half scenarios on three widely used benchmarks: CIFAR-100 (Krizhevsky and Hinton 2009), Image Net-Subset (Hou et al. 2019), and Tiny Image Net (Le and Yang 2015).
Dataset Splits Yes Specifically, we equally divided the classes into five tasks and continuously evaluated the model s accuracy on tasks T0 and T1 at each iteration. In the train from scratch scenario, the model is trained on an equal number of classes in each incremental task, while in the train from half scenario, the model is trained on half the number of all classes in the first task and an equal number of classes in each subsequent task. Following standard CIL practices, we shuffled the class order with a random seed of 1993 (Rebuffi et al. 2017; Zhou et al. 2023).
Hardware Specification Yes Our implementation, based on Py Torch (Paszke et al. 2019) and PYCIL (Zhou et al. 2023), ran on an NVIDIA 4090 using Res Net-18 (He et al. 2016) as the model architecture.
Software Dependencies No Our implementation, based on Py Torch (Paszke et al. 2019) and PYCIL (Zhou et al. 2023), ran on an NVIDIA 4090 using Res Net-18 (He et al. 2016) as the model architecture.
Experiment Setup Yes The models were trained with a batch size of 128 using SGD with momentum. For the baseline methods, the KD weight is set to 1. When using only Linter, the hyperparameters α and β are set to 1 and 0, respectively. To ensure a fair comparison without affecting learning, when both Linter and Lintra are implemented, α and β are each set to 1/2.