Coreset-Driven Re-Labeling: Tackling Noisy Annotations with Noise-Free Gradients
Authors: Saumyaranjan Mohanty, Konda Reddy Mopuri
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive evaluation over CIFAR-100N, Web Vision, and Image Net-1K Datasets, we demonstrate that our method outperforms the SOTA coreset selection for re-labeling methods (Divide Mix and SOP+). We have provided the codebase at URL. |
| Researcher Affiliation | Academia | Saumyaranjan Mohanty EMAIL Department of Artificial Intelligence Indian Institute of Technology Hyderabad Konda Reddy Mopuri EMAIL Department of Artificial Intelligence Indian Institute of Technology Hyderabad |
| Pseudocode | Yes | Algorithm 1 Modified Noise-free Gradients for Re-labeling algorithm |
| Open Source Code | Yes | We have provided the codebase at URL. |
| Open Datasets | Yes | CIFAR-100N (Wei et al., 2022) is the CIFAR-100 dataset with human-annotated real-world noisy labels collected from Amazon Mechanical Turk. This specialised dataset incorporates human-annotated real-world noisy labels. It consists of 50, 000 colour images of dimension 32 32 3 from 100 different classes, each class having 500 images. Web Vision (Li et al., 2017) contains 2.4M images crawled from the Web using the 1, 000 concepts in Image Net-1K (Deng et al., 2009). Similar to prior works (Chen et al., 2019; Park et al., 2023), we use the mini-Web Vision version consisting of the first 50 classes of the Google image subset with approximately 66, 000 training images. Following the approach in Park et al. (2023), we introduced 20% asymmetric noise to the Image Net-1k (Deng et al., 2009) dataset. |
| Dataset Splits | Yes | Image Net-1K consists of 1000 classes, with 1, 281, 167 training images and 50, 000 validation images. |
| Hardware Specification | No | No specific hardware details (GPU models, CPU models, etc.) are provided in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers are mentioned in the paper. |
| Experiment Setup | Yes | Table 3: Hyper-parameter values used across multiple datasets. Settings CIFAR-100N Web Vision ILSVRC Epochs 300 100 50 Optimizer SGD SGD SGD Momentum 0.9 0.9 0.9 Weight Decay 0.0005 0.0005 0.0005 Batch Size 128 32 64 Learning Rate 0.02 0.02 0.02 |