On the Role of Label Noise in the Feature Learning Process
Authors: Andi Han, Wei Huang, Zhanpeng Zhou, Gang Niu, Wuyang Chen, Junchi Yan, Akiko Takeda, Taiji Suzuki
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both synthetic and realworld setups validate our theory. ... In this section, we provide empirical evidence under both synthetic and real-world setups to support our theory. |
| Researcher Affiliation | Academia | 1RIKEN AIP 2University of Sydney 3School of Computer Science & AI, Shanghai Jiao Tong University 4Simon Fraser University 5University of Tokyo. Correspondence to: Andi Han <EMAIL>, Wei Huang <EMAIL>, Zhanpeng Zhou <EMAIL>. |
| Pseudocode | No | The paper describes methods using mathematical formulas and textual explanations within the main body and appendices, but it does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for replicating the results is available on https://github.com/ zzp1012/label-noise-theory. |
| Open Datasets | Yes | We perform experiments on the commonly used image classification dataset CIFAR-10 (Krizhevsky et al., 2009) |
| Dataset Splits | No | The paper mentions using "samples from the first two categories of CIFAR-10" for training, but it does not specify the training, validation, or test split percentages, sample counts, or refer to standard predefined splits for these categories. |
| Hardware Specification | No | The paper does not explicitly state the specific hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using standard libraries and tools like 'VGG net' and 'stochastic gradient descent (SGD)' but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | A two-layer CNN (as defined in Section 3) is trained using gradient descent, with a total of T = 200 iterations and a learning rate of η = 0.1. |