Balancing Model Efficiency and Performance: Adaptive Pruner for Long-tailed Data

Authors: Zhe Zhao, Haibin Wen, Pengkun Wang, Shuang Wang, Zhenkun Wang, Qingfu Zhang, Yang Wang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that LTAP outperforms existing methods on various long-tailed datasets, achieving a good balance between model compression rate, computational efficiency, and classification accuracy.
Researcher Affiliation Academia 1University of Science and Technology of China, Hefei, China 2City University of Hong Kong, Hong Kong 3Suzhou Institute for Advanced Research, USTC, Suzhou, China 4Southern University of Science and Technology, Shenzhen, China 5Anhui Provincial Key Laboratory of High Performance Computing, Hefei, China. Correspondence to: Pengkun Wang <EMAIL>, Yang Wang <EMAIL>.
Pseudocode Yes E. Pseudocode
Open Source Code Yes The code is available at https://github.com/ Data Lab-atom/LT-VOTE.
Open Datasets Yes Datasets. CIFAR-100-LT is a long-tailed version of CIFAR100, containing 100 classes with two imbalance ratios (IR = 50, 100). Image Net-LT is a long-tailed version of Image Net, with 1,000 classes and natural long-tailed distribution. i Naturalist 2018 is a large-scale real-world dataset with 8,142 species categories and inherent long-tailed distribution.
Dataset Splits Yes For the CIFAR-100LT dataset, we follow the general experimental settings of (Cao et al., 2019) and use Res Net-32 (proposed by (He et al., 2016)) as the backbone network. ... For Image Net-LT and i Naturalist 2018 datasets, we use Res Net-50 as the backbone network...
Hardware Specification No No specific hardware details (GPU/CPU models, memory amounts, or processor types) are provided in the paper.
Software Dependencies No The paper mentions optimizers (GD optimizer) and backbone networks (ResNet-32, ResNet-50) but does not provide specific software dependency versions (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes Implementation Details. We use the knowledge generated from the long-tailed recognition task to guide the pruning of the backbone network. Specifically, for each parameter in the model, we calculate scores using magnitude , avg magnitude , cosine similarity , taylor first order , and taylor second order during the gradient descent process. These scores are then weighted based on the cumulative change in accuracy for each class on the validation set. The weighted sum of the scores is used to determine whether to prune a parameter. We start the continuous pruning process after the 100th epoch, and the final model retains 30% of the original parameters. For the final evaluation phase, we use the same settings as DODA (Wang et al., 2024) for all baseline methods and our method. For the CIFAR-100LT dataset, we follow the general experimental settings of (Cao et al., 2019) and use Res Net-32 (proposed by (He et al., 2016)) as the backbone network. The network is trained for 200 epochs using the GD optimizer with an initial learning rate of 10ˆ-4, momentum of 0.9, and weight decay of 2 10ˆ-4. For Image Net-LT and i Naturalist 2018 datasets, we use Res Net-50 as the backbone network, train the network for 100 epochs with an initial learning rate of 0.1, and reduce the learning rate by 0.1 at the 60th and 80th epochs. For all experiments, we set the value of the hyperparameter pau to 0.5.