reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Combating Semantic Contamination in Learning with Label Noise

Authors: Wenxiao Fan, Kan Li

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our method outperforms existing approaches on both synthetic and real-world noisy datasets, effectively mitigating the impact of label noise and Semantic Contamination. Experimental results show that our method advances state-of-the-art results on CIFAR with synthetic label noise, as well as on real-world noisy datasets.
Researcher Affiliation	Academia	Wenxiao Fan, Kan Li School of Computer Science, Beijing Institute of Technology EMAIL
Pseudocode	Yes	The overall pipeline is shown in Fig. 4 and the algorithm pseudocode is in Appendix.
Open Source Code	No	The paper does not explicitly state that the code is open-source, provide a repository link, or mention code in supplementary materials.
Open Datasets	Yes	Datasets. To verify the effectiveness of our method, we perform our method on classification tasks with six benchmarks: CIFAR-10 (Krizhevsky, Hinton et al. 2009), CIFAR100 (Krizhevsky, Hinton et al. 2009), CIFAR-10N (Wei et al. 2022), CIFAR-100N (Wei et al. 2022), Animal-10N (Song, Kim, and Lee 2019) and Web Vision (Li et al. 2017).
Dataset Splits	No	The paper mentions running experiments on datasets like CIFAR-10/100, CIFAR-10N/100N, Animal-10N, and Web Vision, and discusses synthetic noise injection and noise rates. It also states "All the results from our runs are the average test accuracy over the last 10 epochs." However, it does not provide explicit details about the training, validation, and test splits (e.g., percentages, sample counts, or specific split files) for these datasets, nor does it refer to specific standard splits by name with citations.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	Following (Li, Socher, and Hoi 2020; Chen et al. 2023; Zhang et al. 2023), we apply the regularization Ldiv to increase the diversity of predictions: Following (Li, Socher, and Hoi 2020; Chen et al. 2023), we warm up the models and estimate the label confidence ω using the small-loss criterion and Gaussian Mixture Model (GMM). We vary τ from 0.05 to 0.9 and range c from 0.90 to 0.99.