Learning Imbalanced Data with Beneficial Label Noise
Authors: Guangzheng Hu, Feng Liu, Mingming Gong, Guanghui Wang, Liuhua Peng
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validated the superiority of LNR on synthetic and real-world datasets. Our work opens a new avenue for imbalanced learning, highlighting the potential of beneficial label noise. ... Experiment results on synthetic and real-world data validated the superiority of our proposed method and its versatility in integrating with existing algorithm-level methods. |
| Researcher Affiliation | Academia | 1School of Mathematics and Statistics, University of Melbourne, Victoria, Australia 2School of Computing and Information Systems, University of Melbourne, Victoria, Australia 3School of Statistics and Data Science, LPMC, KLMDASR, and LEBPS, Nankai University. Correspondence to: Liuhua Peng <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Flip labels in binary classification with noise model ρ(x) ... Algorithm 2 Constructing label noises during training at round t |
| Open Source Code | Yes | Code is available at https: //github.com/guangzhengh/LNR.git |
| Open Datasets | Yes | Datasets. Imbalanced Binary classification: We evaluated LNR on 32 datasets from the KEEL repository (Derrac et al., 2015), featuring a wide range of imbalance ratios from 1.82 to 49.6. ... Imbalanced Multi-class classification: We conducted experiments on two multi-class image datasets: CIFAR-10 and CIFAR-100. |
| Dataset Splits | Yes | We employed various seeds to generate 30 sets of synthetic training and testing datasets, with the sample size Ntrain = 500 and Ntest = 2000 for all 8 distributions. ... For the KEEL datasets, we use different seeds to partition the data into 100 sets of training and testing datasets at a ratio of 7:3, ensuring at least 10 minority class samples in each testing set. ... For long-tailed CIFAR-10, we report the average accuracy across Many-shot (Nc > 1000), Many-shot (100 Nc <= 200), and Few-shot (Nc < 2000), where NC = [5000, 2997, 1796, 1077, 645, 387, 232, 139, 83, 50]. For long-tailed CIFAR-100, we define the Many-shot (Nc > 200), Many-shot (200 Nc <= 20), and Few-shot (Nc < 20). |
| Hardware Specification | Yes | Hardware information. Type of CPU: One Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz with 16 cores. Type of GPU: One NVIDIA Ge Force RTX 4080 SUPER with memory of 16GB. |
| Software Dependencies | No | Re-sampling methods are compared and implemented using the imbalanced-learn package 1 in Python. Classifers KNN, CART, and MLP are implemented with the Python scikit-learn package 2... The Resnet-32 is implemented with Pytorch. |
| Experiment Setup | Yes | The hyperparameters of classifiers are listed in Table 5. ... The maximum training epoch of the MLP classifier is set to 800 as MLP converges faster on synthetic datasets. ... For all experiments, the Resnet-32 is trained up to 100 epochs using SDG (stochastic gradient descend) optimizer with a momentum of 0.9 and a learning rate of 0.1 for the first 60 epochs, 0.01 for the epochs 60-80, and 0.0001 after the 80 epochs. |