reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Enhancing Noise-Robust Losses for Large-Scale Noisy Data Learning

Authors: Max Staats, Matthias Thamm, Bernd Rosenow

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the following, we present empirical evidence highlighting the efficiency of the logit bias in enhancing the learning capability of MAE on all datasets, demonstrating competitive or superior performance in the presence of label noise. In addition, we enable the loss functions gen CE, NCE-AGCE and NF-MAE, to learn on the more complicated Web Vision dataset using either new optimized hyperparameters or the logit bias, showcasing that the overlap between the initial output distribution and δk is indeed the crucial requirement for learning. Datasets employed in this study include the publicly available datasets Cifar-10, Cifar-100 (Krizhevsky, Hinton et al. 2009), Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017), and the web-scraped Web Vision dataset (Li et al. 2017). [...] Table 2 summarizes the results on the datasets Cifar-10, Cifar-100 and Fashion MNIST providing mean accuracies and associated errors (across five seeds) under various levels of label noise.
Researcher Affiliation	Collaboration	Max Staats 1,2 *, Matthias Thamm 2, Bernd Rosenow 2 1Sca DS.AI Dresden/Leipzig, Germany 2 Institut f ur Theoretische Physik, Universit at Leipzig, Br uderstrasse 16, 04103 Leipzig, Germany EMAIL, EMAIL
Pseudocode	No	The paper describes methods and calculations in narrative text and equations (e.g., Eq. 1, 2, 3, 4), but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	All code for reproducing the data and creating the figures in this paper is open source and available under (Author s). Author(s), A. 2023. All code, scripts, and data used in this work are included in a Zenodo archive: https://zenodo.org/ records/13150928. Zenodo.
Open Datasets	Yes	Datasets employed in this study include the publicly available datasets Cifar-10, Cifar-100 (Krizhevsky, Hinton et al. 2009), Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017), and the web-scraped Web Vision dataset (Li et al. 2017).
Dataset Splits	No	The paper mentions using well-known datasets like Cifar-10, Cifar-100, Fashion-MNIST, and Web Vision, and discusses training with various noise levels. However, it does not explicitly provide details on how these datasets were split into training, validation, or test sets (e.g., specific percentages or sample counts for each split) in the main text.
Hardware Specification	No	The paper states, 'Computations for this work were done (in part) using resources of the Leipzig University Computing Center.' However, it does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies	No	The paper mentions that code is open source and available in a Zenodo archive, but it does not explicitly list specific software dependencies or their version numbers (e.g., programming languages, libraries, frameworks) within the main text.
Experiment Setup	Yes	Table 2 summarizes the results on the datasets Cifar-10, Cifar-100 and Fashion MNIST providing mean accuracies and associated errors (across five seeds) under various levels of label noise. For Cifar-10, we find that in the absence of label noise, MAE is unable to sufficiently learn the dataset, performing worse than all other loss functions. Yet, the introduction of a small logit bias (specifically, ϵ = 0.5) in MAE* noticeably enhances the learning... For demonstration purpose we also add ϵ = 1.5 to the table, showcasing that the logit bias ϵ can be used to tune a balance between learning speed and noiserobustness. [...] In Fig. 3 we demonstrate how this optimization changes the learning curve of gen CE when going from 100 to 1000 classes. We find that the parameter q = 0.48 provides the same δk (0) for K = 1000 classes as in the case of K = 100 with q = 0.7.