reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Decontamination of Mutual Contamination Models

Authors: Julian Katz-Samuels, Gilles Blanchard, Clayton Scott

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this Section, we perform experiments that suggest that joint irreducibility of P1, . . . , PL is a reasonable assumption. In particular, our experiments suggest that on the datasets in question, (A2) holds (which is a strictly stronger condition than joint irreducibility). We consider three datasets: classes 1, 2, and 3 of MNIST (Le Cun et al., 1998), the Iris dataset (Fisher, 1936), and the Breast Cancer Wisconsin (Diagnostic) Data Set (Dheeru and K. Taniskidou, 2017). We use the Spectral Support Estimation algorithm (De Vito et al., 2010; Rudi et al., 2014) to estimate the support of each class in each dataset. We split each dataset into training, validation, and test sets, applying the algorithm to the training set, using the validation set to pick the hyperparameters, and evaluating the performance on the test set. We average our results over 60 trials where in each trial we randomly permute the dataset, thus altering the training, validation, and test sets. Let p Si denote an estimate of the support of class i. Tables 1, 3, and 5 display an estimate of the probability that a point sampled from Pi belongs to the estimate of the support p Si. They indicate that the Spectral Support Estimation has reasonably good performance in producing p Sis containing the support of the associated class. Tables 2, 4, and 6 use the p Si to estimate the quantity Prx Pipx P Yj i suppp Pjqq, which must be strictly less than 1 for (A2) to hold. We ﬁnd that our estimates are considerably less than 1, which suggests that joint irreducibility holds on these datasets.
Researcher Affiliation	Academia	Julian Katz-Samuels EMAIL Department of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109-2122 USA Gilles Blanchard EMAIL Universit at Potsdam, Institut f ur Mathematik D-14476 Potsdam, Germany Clayton Scott EMAIL Department of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109-2122 USA
Pseudocode	Yes	Algorithm 1 Residue(F0 \| F1) Algorithm 2 Multi Residue(F0 \| t F1, . . . , FKu) Algorithm 3 Multiclass( P1, . . . , PL) Algorithm 4 Demix(S1, . . . , SK) Algorithm 5 Find Face(S1, . . . , SK) Algorithm 6 Face Test(S1, . . . , SK) Algorithm 7 Partial Label(Π , p P1, . . . , PMq T ) Algorithm 8 Generate Candidates(k, p Q1, . . . , QLq T ) Algorithm 9 Vertex TestpΠ , p P1, . . . , PMq T , p Q1, . . . , QLq T q Algorithm 10 Residue Hat( p F \| p H) Algorithm 11 Demix Hat(p S1, . . . , p SK \| ϵ) Algorithm 12 Find Face Hat(p S1, . . . , p SK \| ϵ) Algorithm 13 Face Test Hat( p Q1, , p QK \| ϵ) Algorithm 14 Partial Label HatpΠ , p P : 1, . . . , P : Mq T \| ϵq Algorithm 15 Non Square Demix( P1, . . . , PM) Algorithm 16 Demix2(S1, . . . , SK) Algorithm 17 Vertex Test HatpΠ , p P : 1, . . . , P : Mq T , p p Q1, . . . , p QLq T q
Open Source Code	No	The paper does not contain any explicit statement or link providing access to source code for the methodology described.
Open Datasets	Yes	We consider three datasets: classes 1, 2, and 3 of MNIST (Le Cun et al., 1998), the Iris dataset (Fisher, 1936), and the Breast Cancer Wisconsin (Diagnostic) Data Set (Dheeru and K. Taniskidou, 2017).
Dataset Splits	No	We split each dataset into training, validation, and test sets, applying the algorithm to the training set, using the validation set to pick the hyperparameters, and evaluating the performance on the test set. We average our results over 60 trials where in each trial we randomly permute the dataset, thus altering the training, validation, and test sets.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments are mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers are provided in the paper.
Experiment Setup	No	We split each dataset into training, validation, and test sets, applying the algorithm to the training set, using the validation set to pick the hyperparameters, and evaluating the performance on the test set.