reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal Sample Complexity of M-wise Data for Top-K Ranking

Authors: Minje Jang, Sunghyun Kim, Changho Suh, Sewoong Oh

NeurIPS 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We run numerical experiments using synthetic data and conﬁrm that the optimal sample size decreases at the rate of 1/M. Moreover, running our algorithm on real-world data, we ﬁnd that its applicability extends to settings that may not ﬁt the PL model. We conduct numerical experiments to support our result.
Researcher Affiliation	Academia	Minje Jang School of Electrical Engineering KAIST EMAIL Sunghyun Kim Electronics and Telecommunications Research Institute Daejeon, Korea EMAIL Changho Suh School of Electrical Engineering KAIST EMAIL Sewoong Oh Industrial and Enterprise Systems Engineering Department UIUC EMAIL
Pseudocode	Yes	Algorithm 1 Rank Centrality [37] Input the collection of statistics s = s I : I E(M) . Convert the M-wise sample for each hyper-edge I into M pairwise samples: 1. Choose a circular permutation of the items in I uniformly at random, 2. Break it into the M pairs of adjacent items, and denote the set of pairs by φ(I), 3. Use the (pairwise) data of the pairs in φ(I).
Open Source Code	No	The paper does not include any explicit statement about releasing source code or a link to a code repository.
Open Datasets	No	The paper uses synthetic data and real-world data collected from League of Legends, but does not provide concrete access information (e.g., specific link, DOI, repository name, formal citation with authors/year) for a publicly available dataset.
Dataset Splits	No	The paper discusses experimental parameters like 'L: number of repeated comparisons' and 'p: probability of edge existence', but does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	We set constant c1 = 2, and set pdense = 0.25 and psparse = 0.025, to make each be in its proper range. We use n = 500, K = 10, and K = 0.1. We set the other parameters as n = 100, L = 20, K = 5 and K = 0.3. We use n = 100, M = 4, p = 0.0025 (M 1) q log n/ n 1 M 1 , K = 5 and K = 0.3.