Mixture of Balanced Information Bottlenecks for Long-Tailed Visual Recognition

Authors: Yifan Lan, Cai xin, Jun Cheng, Shan Tan

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on commonly used long-tailed datasets, including CIFAR100-LT, Image Net-LT, and i Naturalist 2018. Both BIB and MBIB reach state-of-the-art performance for long-tailed visual recognition.
Researcher Affiliation Academia Yifan Lan EMAIL Huazhong University of Science and Technology Xin Cai EMAIL Huazhong University of Science and Technology Jun Cheng EMAIL Huazhong University of Science and Technology Shan Tan EMAIL Huazhong University of Science and Technology
Pseudocode Yes The code of BIB loss is presented in Appendix J. ... Figure 10: Python code of BIB Loss.
Open Source Code Yes The code of BIB loss is presented in Appendix J. ... Figure 10: Python code of BIB Loss.
Open Datasets Yes We conduct experiments on commonly used long-tailed datasets, including CIFAR100-LT, Image Net-LT, and i Naturalist 2018.
Dataset Splits Yes CIFAR-100 includes 60K images, of which 50K images are for training and 10K for verification. ... The Image Net-LT training set contains 115.8K images... There are 20 images of each class in the validation set, and 50 images in each class of the test set. ... According to the setting in Liu et al. (2019), we divide the dataset into three subsets according to the number of samples: Many shot (more than 100 samples), Medium shot (between 20 and 100 samples), and Few shot (less than 20 samples).
Hardware Specification No No specific hardware details (like GPU/CPU models, processors, or memory) are provided in the paper.
Software Dependencies No The paper mentions using Python, PyTorch (torch, torch.nn.functional), and NumPy libraries in the provided code snippet in Appendix J, and SGD optimizer in the implementation details, but specific version numbers for these software dependencies are not provided.
Experiment Setup Yes For CIFAR-100-LT, we process samples in the same way as in Cao et al. (2019). We use Res Net32 as the backbone network. To keep consistent with the previous settings (Cao et al., 2019), we use the SGD optimizer with a momentum of 0.9 and weight decay of 0.0003. We train 200 epochs for each model. The initial learning rate is 0.1, and the first five epochs use the linear warm-up. The learning rate decays by 0.01 at the 160th and the 180th epoch. The batch size of all experiments is 128. ... For the setting of hyperparameters, we take β in {0, 1, 2, 3, 4, 5} according to different datasets. For all of the datasets, we use a = 0.1, b = 0.3 and m = 0.1. For CIFAR100-LT, we use γ = 0, and for Image Net-LT and i Naturalist 2018, we use γ = 0.5.