reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Truthing Issues in Supervised Classification

Authors: Jonathan K. Su

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate the effectiveness of the methods and confirm the implication. We conducted a number of experiments to see how the different testing and training methods performed and to check the implication of equivalent mutual information for different combinations of labelers.
Researcher Affiliation	Academia	Jonathan K. Su EMAIL MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02421-6426, USA
Pseudocode	Yes	Algorithm 1 MMSE testing with empirical Bayes estimation of ( p D, p FA) via ratios of jointly normal RVs. Algorithm 2 MMSE testing with empirical Bayes estimation of ( p D, p FA) via sampling. Algorithm 3 Suboptimal estimation of ( p D, p FA) by estimating the correct-label RVs Y. Algorithm 4 MMSE testing for multi-class classification with empirical Bayes estimation of K via sampling.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. It only provides a license for the paper itself and attribution requirements for the paper content, not for any accompanying code.
Open Datasets	Yes	We use the Ionosphere binary-classification data set from the UCI Machine Learning Repository (see Dua and Graff, 2017)
Dataset Splits	Yes	We employ 75% 25% stratified hold-out validation since multi-fold cross-validation produced cluttered plots that were too difficult to read.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments. It describes the simulation settings and algorithms but no information about the computational resources.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers needed to replicate the experiments. It mentions using L2 regularization and the Broyden-Fletcher-Goldfarb-Shanno method for training but no specific software libraries or their versions.
Experiment Setup	Yes	The settings were δi ∼ Beta(1, 5), ∀i; φt ∼ U(0, 0.5), ∀t; η1 = 1 to force the first labeler to label every sample; and ηt ∼ U(0.33, 1), ∀t ∈ T \ {1}. ... For each training method, the regularization weight λ was swept over {0.5, 1.0, ..., 10.0}, producing twenty trained models. ... This section presents results for the single default threshold τ = 1/2.