Coreset-Driven Re-Labeling: Tackling Noisy Annotations with Noise-Free Gradients

Authors: Saumyaranjan Mohanty, Konda Reddy Mopuri

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive evaluation over CIFAR-100N, Web Vision, and Image Net-1K Datasets, we demonstrate that our method outperforms the SOTA coreset selection for re-labeling methods (Divide Mix and SOP+). We have provided the codebase at URL.
Researcher Affiliation Academia Saumyaranjan Mohanty EMAIL Department of Artificial Intelligence Indian Institute of Technology Hyderabad Konda Reddy Mopuri EMAIL Department of Artificial Intelligence Indian Institute of Technology Hyderabad
Pseudocode Yes Algorithm 1 Modified Noise-free Gradients for Re-labeling algorithm
Open Source Code Yes We have provided the codebase at URL.
Open Datasets Yes CIFAR-100N (Wei et al., 2022) is the CIFAR-100 dataset with human-annotated real-world noisy labels collected from Amazon Mechanical Turk. This specialised dataset incorporates human-annotated real-world noisy labels. It consists of 50, 000 colour images of dimension 32 32 3 from 100 different classes, each class having 500 images. Web Vision (Li et al., 2017) contains 2.4M images crawled from the Web using the 1, 000 concepts in Image Net-1K (Deng et al., 2009). Similar to prior works (Chen et al., 2019; Park et al., 2023), we use the mini-Web Vision version consisting of the first 50 classes of the Google image subset with approximately 66, 000 training images. Following the approach in Park et al. (2023), we introduced 20% asymmetric noise to the Image Net-1k (Deng et al., 2009) dataset.
Dataset Splits Yes Image Net-1K consists of 1000 classes, with 1, 281, 167 training images and 50, 000 validation images.
Hardware Specification No No specific hardware details (GPU models, CPU models, etc.) are provided in the paper.
Software Dependencies No No specific software dependencies with version numbers are mentioned in the paper.
Experiment Setup Yes Table 3: Hyper-parameter values used across multiple datasets. Settings CIFAR-100N Web Vision ILSVRC Epochs 300 100 50 Optimizer SGD SGD SGD Momentum 0.9 0.9 0.9 Weight Decay 0.0005 0.0005 0.0005 Batch Size 128 32 64 Learning Rate 0.02 0.02 0.02