reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Simple, Robust and Optimal Ranking from Pairwise Comparisons

Authors: Nihar B. Shah, Martin J. Wainwright

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Section 4 provides the results of experiments on both simulated and real-world data sets. 4. Simulations and experiments 4.1 Simulated data 4.2 Experiments on data from Amazon Mechanical Turk
Researcher Affiliation	Academia	Nihar B. Shah EMAIL Machine Learning Department and Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213, USA Martin J. Wainwright EMAIL Department of Electrical Engineering and Computer Sciences and Department of Statistics University of California Berkeley, CA 94720, USA
Pseudocode	No	The analysis of this paper focuses on a simple counting-based algorithm, often called the Borda count method (de Borda, 1781). We employ this method here for the setting of pairwise comparisons, noting that the Borda count method more generally also supports comparisons between more than two items. More precisely, for each distinct i, j [n] and every integer ℓ [r], let Y ℓ ij { 1, 0, +1} represent the outcome of the ℓth comparison between the pair i and j, deﬁned as... For each i [n], the quantity ℓ [r] 1{Y ℓ ij = 1} (6) corresponds to the number of pairwise comparisons won by item i.
Open Source Code	No	The paper does not provide an explicit statement or a direct link to the authors' implementation code for the methodology described.
Open Datasets	Yes	We employed a dataset of 23 images... obtained from the dataset collected by Carpenter et al. (2006). In this section, we describe three additional experiments using data collected from Amazon Mechanical Turk in our past work Shah et al. (2016a);
Dataset Splits	No	The paper describes subsampling strategies for evaluating the algorithm, such as 'subsample the responses with p = 0.5' and 'subsampled a fraction q of the data', but does not provide specific training/test/validation dataset splits typically used for model reproduction.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup	No	The paper discusses evaluation methodologies and model parameters for certain simulated models, for example, 'In more detail, the six model types are given by: (I) Bradley-Terry-Luce (BTL) model...', but it does not provide concrete hyperparameter values or system-level training settings in the main text for reproducing experiments.