Simple, Robust and Optimal Ranking from Pairwise Comparisons

Authors: Nihar B. Shah, Martin J. Wainwright

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Section 4 provides the results of experiments on both simulated and real-world data sets. 4. Simulations and experiments 4.1 Simulated data 4.2 Experiments on data from Amazon Mechanical Turk
Researcher Affiliation Academia Nihar B. Shah EMAIL Machine Learning Department and Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213, USA Martin J. Wainwright EMAIL Department of Electrical Engineering and Computer Sciences and Department of Statistics University of California Berkeley, CA 94720, USA
Pseudocode No The analysis of this paper focuses on a simple counting-based algorithm, often called the Borda count method (de Borda, 1781). We employ this method here for the setting of pairwise comparisons, noting that the Borda count method more generally also supports comparisons between more than two items. More precisely, for each distinct i, j [n] and every integer ℓ [r], let Y ℓ ij { 1, 0, +1} represent the outcome of the ℓth comparison between the pair i and j, defined as... For each i [n], the quantity ℓ [r] 1{Y ℓ ij = 1} (6) corresponds to the number of pairwise comparisons won by item i.
Open Source Code No The paper does not provide an explicit statement or a direct link to the authors' implementation code for the methodology described.
Open Datasets Yes We employed a dataset of 23 images... obtained from the dataset collected by Carpenter et al. (2006). In this section, we describe three additional experiments using data collected from Amazon Mechanical Turk in our past work Shah et al. (2016a);
Dataset Splits No The paper describes subsampling strategies for evaluating the algorithm, such as 'subsample the responses with p = 0.5' and 'subsampled a fraction q of the data', but does not provide specific training/test/validation dataset splits typically used for model reproduction.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup No The paper discusses evaluation methodologies and model parameters for certain simulated models, for example, 'In more detail, the six model types are given by: (I) Bradley-Terry-Luce (BTL) model...', but it does not provide concrete hyperparameter values or system-level training settings in the main text for reproducing experiments.