reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Cost-Sensitive Learning with Noisy Labels

Authors: Nagarajan Natarajan, Inderjit S. Dhillon, Pradeep Ravikumar, Ambuj Tewari

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6. Experiments In our ﬁrst set of experiments, we demonstrate the robustness of the proposed algorithms to increasing rates of label noise on synthetic and real-world data sets. In our second set of experiments, we also conduct a comparison of the performance of our two proposed methods with state-of-the-art methods for dealing with random label noise. In our experiments, we use the two utility measures listed in Proposition 1, i.e. UAcc and UAM; note that the utility measures are computed with respect to the clean distribution. For given noise rates ρ+1 and ρ 1, labels are ﬂipped accordingly. To account for randomness in the ﬂips to simulate a given noise rate, we repeat each experiment 3 times, with independent corruptions of the data set for same setting of ρ+1 and ρ 1, and present the mean accuracy over the trials. Speciﬁcally, we divide each data set randomly into three training and test sets, and compute average utility over 3 train-test splits. We use cross-validation to tune parameters speciﬁc to the algorithms. Note that we perform cross-validation on a separate validation set with noisy labels.
Researcher Affiliation	Collaboration	Nagarajan Natarajan EMAIL Microsoft Research, Bangalore 560001, INDIA Inderjit S. Dhillon EMAIL Dept. of Computer Science University of Texas at Austin Austin, TX 78701 Pradeep Ravikumar EMAIL Machine Learning Dept. Carnegie Mellon University Pittsburgh, PA 15213 Ambuj Tewari EMAIL Dept. of Statistics, and Dept. of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109
Pseudocode	Yes	Algorithm 1: Online learning using unbiased gradients Choose learning rate γ > 0 W = {w : w 2 W2} ΠW ( ) = Euclidean projection onto W Initialize w0 0 for i = 1 to n do Receive xi Rd Predict wi 1, xi Receive noisy label yi Update wi ΠW (wi 1 γg( wi 1, xi , yi)xi) where g( , ) is deﬁned in (5) end for
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described in this paper. It mentions using a third-party library, libsvm, but does not provide its own implementation code or a link to it.
Open Datasets	Yes	We use seven standard UCI classiﬁcation data sets listed in Table 1; here, data sets 1 through 6 are preprocessed and made available by Gunnar R atsch.1 1. http://theoval.cmp.uea.ac.uk/matlab
Dataset Splits	Yes	Speciﬁcally, we divide each data set randomly into three training and test sets, and compute average utility over 3 train-test splits. We use cross-validation to tune parameters speciﬁc to the algorithms. Note that we perform cross-validation on a separate validation set with noisy labels.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using the "libsvm library" but does not specify a version number, which is required for reproducibility.
Experiment Setup	Yes	In all the cases, we tune the parameters α, ρ+1 and ρ 1 by cross-validation (on noisy validation set). For kernelized algorithms, we set the Gaussian kernel width parameter γ to 1/d where d is the dimensionality of data (the default parameter setting in libsvm).