reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Data-driven Rank Breaking for Efficient Rank Aggregation

Authors: Ashish Khetan, Sewoong Oh

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Figure 2, we verify the scaling of the resulting error via numerical simulations. ... In Figure 2, we verify the scaling of the resulting error via numerical simulations. We fix d = 1024 and κj = κ = 128, and vary the number of separators ℓj = ℓfor fixed n = 128000 (left), and vary the number of samples n for fixed ℓj = ℓ= 16 (middle). Each point is average over 100 instances. The plot confirms that the mean squared error scales as 1/(ℓn). ... On real-world data sets on sushi preferences (Kamishima, 2003), we show that the data-driven rank-breaking improves over Generalized Method-of-Moments (GMM) proposed by Azari Soufiani et al. (2013). ... Figure 9: The data-driven rank-breaking achieves smaller error compared to the state-of-the-art GMM approach.
Researcher Affiliation	Academia	Ashish Khetan EMAIL Sewoong Oh EMAIL Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana-Champaign Urbana, IL 61801, USA
Pseudocode	No	The paper describes the 'Consistent rank-breaking estimator' by providing mathematical formulas for the convex optimization problem (Equations 2 and 3). It also mentions 'the algorithm described in (33)'. However, it does not present any of these in a clearly labeled or structured pseudocode or algorithm block, but rather through textual descriptions and mathematical expressions.
Open Source Code	No	The paper mentions using 'Hunter (2004) implementation of the ML estimator' and compares performance with 'GMM algorithm' (Azari Soufiani et al. (2013)). These refer to third-party tools or methods by other researchers. The authors do not state that they are releasing their own code for the methodology described in this paper, nor do they provide any links to a code repository.
Open Datasets	Yes	On real-world data sets on sushi preferences (Kamishima, 2003), we show that the data-driven rank-breaking improves over Generalized Method-of-Moments (GMM) proposed by Azari Soufiani et al. (2013). ... We compare our algorithm with the GMM algorithm on two other real-world data-sets as well. We use jester data set (Goldberg et al., 2001)... We perform similar experiments on American Psychological Association (APA) data-set (Diaconis, 1989).
Dataset Splits	No	The paper describes how partial rankings were simulated from existing full datasets (e.g., 'removing the known ordering among bottom 10 ℓalternatives for each sample' or 'simulating random-ℓ separators'). It also states 'Each point is average over 100 instances', indicating repeated experiments. However, it does not provide specific train/test/validation dataset splits (e.g., percentages, sample counts, or references to predefined splits for model training and evaluation).
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions 'Hunter (2004) implementation of the ML estimator' for computing estimates and 'Generalized Method-of-Moments (GMM) proposed by Azari Soufiani et al. (2013)'. However, it does not list any specific software or library names with version numbers that would be necessary to reproduce the experimental environment.
Experiment Setup	Yes	In Figure 2, we verify the scaling of the resulting error via numerical simulations. We fix d = 1024 and κj = κ = 128, and vary the number of separators ℓj = ℓfor fixed n = 128000 (left), and vary the number of samples n for fixed ℓj = ℓ= 16 (middle). Each point is average over 100 instances. ... We fix d = 1024 items and the underlying preference vector θ is uniformly distributed over [ b, b] for b = 2. We generate n rankings over sets Sj of size κ for j [n] according to the PL model with parameter θ .