reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Role of Label Noise in the Feature Learning Process

Authors: Andi Han, Wei Huang, Zhanpeng Zhou, Gang Niu, Wuyang Chen, Junchi Yan, Akiko Takeda, Taiji Suzuki

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on both synthetic and realworld setups validate our theory. ... In this section, we provide empirical evidence under both synthetic and real-world setups to support our theory.
Researcher Affiliation	Academia	1RIKEN AIP 2University of Sydney 3School of Computer Science & AI, Shanghai Jiao Tong University 4Simon Fraser University 5University of Tokyo. Correspondence to: Andi Han <EMAIL>, Wei Huang <EMAIL>, Zhanpeng Zhou <EMAIL>.
Pseudocode	No	The paper describes methods using mathematical formulas and textual explanations within the main body and appendices, but it does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code for replicating the results is available on https://github.com/ zzp1012/label-noise-theory.
Open Datasets	Yes	We perform experiments on the commonly used image classification dataset CIFAR-10 (Krizhevsky et al., 2009)
Dataset Splits	No	The paper mentions using "samples from the first two categories of CIFAR-10" for training, but it does not specify the training, validation, or test split percentages, sample counts, or refer to standard predefined splits for these categories.
Hardware Specification	No	The paper does not explicitly state the specific hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using standard libraries and tools like 'VGG net' and 'stochastic gradient descent (SGD)' but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	A two-layer CNN (as defined in Section 3) is trained using gradient descent, with a total of T = 200 iterations and a learning rate of η = 0.1.