reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal Data Collection For Informative Rankings Expose Well-Connected Graphs

Authors: Braxton Osting, Christoph Brune, Stanley J. Osher

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 5, we conduct a number of numerical experiments to demonstrate how the optimal data collection methodology developed in Section 4 can be employed. We begin with a few constructed examples and show that graphs can be generated which have larger algebraic connectivity than Erd os-R enyi randomly generated graphs. The rankings of the data sets represented by these well-connected graphs are more informative then those represented by Erd os-R enyi graphs. We then consider the data collection problem for ranking Yahoo! movies and for the 20112012 NCAA Division 1 football season. ... In Section 5.11, we continue with the graph constructed in Section 5.8 and demonstrate using synthetic data that ranking estimates obtained via active sampling are more accurate (in the sense of both the L2-distance and the Kendall-τ rank distance) than via random sampling.
Researcher Affiliation	Academia	Braxton Osting EMAIL Department of Mathematics University of Utah Salt Lake City, UT 84112, USA Christoph Brune EMAIL Department of Applied Mathematics University of Twente 7500 AE Enschede, The Netherlands Stanley J. Osher EMAIL Department of Mathematics University of California Los Angeles, CA 90095, USA
Pseudocode	Yes	Algorithm 1 A greedy heuristic for ﬁnding integer-valued edge weights w for which the w-weighted graph Laplacian has large second eigenvalue (Ghosh and Boyd, 2006b; Wang and Mieghem, 2008). See Section 3.1. Input: An initial edge weight w0 ZN + deﬁned on the complete graph of n nodes and an integer, ξ. Output: An edge weight, w w0, such that w w0 1 = ξ, and w has large second eigenvalue. Set w = w0 (current edge weight) for ℓ= 1 to ξ, do Compute the second eigenvector, F = arg min v =1 v,1 =0 Find the edge (i, j) which maximizes (Fi Fj)2 Set w = w + δij end for
Open Source Code	No	The paper mentions using 'Matlab s lsqr function' and 'Matlab s eigs function' for computation, and a 'Matlab toolbox described in Traud et al. (2009)' for visualization, as well as 'wg Plot (Wu, 2009)' with a link to Mathworks. These are third-party tools or frameworks. There is no explicit statement indicating that the authors have released their own source code for the methodology described in this paper.
Open Datasets	Yes	The Yahoo! Movie user rating data set consists of an incomplete user-movie matrix where entries represent a score given to the movie by the user. ... Yahoo! Webscope dataset: ydata-ymovies-user-movie-ratings-content-v1 0. http:// webscope.sandbox.yahoo.com. accessed: 10/5/2011. ... In this section, we study the 2011-12 NCAA Division 1 football schedule, downloaded from Massey Ratings.5 These were obtained from http://masseyratings.com/scores.php?t=11590&s=107811&all=1&mode=2& format=0
Dataset Splits	No	The paper describes preprocessing for the Yahoo! Movie data set (discarding movies with less than 10 rankings, removing users without reviews) and the generation of synthetic data for the NCAA graph (normally distributed vector for ground truth). However, it does not specify explicit training/validation/test splits for machine learning experiments on either real or synthetic data in a way that would allow direct reproduction of data partitioning.
Hardware Specification	No	No specific hardware details (like GPU models, CPU types, or memory amounts) used for running the experiments are mentioned in the paper.
Software Dependencies	No	The paper mentions using 'Matlab s lsqr function' and 'Matlab s eigs function', a 'Matlab toolbox described in Traud et al. (2009)', and 'wg Plot (Wu, 2009)'. While software is named, specific version numbers for Matlab or any of the toolboxes are not provided, which is necessary for reproducibility.
Experiment Setup	Yes	In Section 5.11, for the synthetic data experiment, the paper states: 'We take as ground truth rating, φ, a normally distributed vector with mean zero and variance, σ2 = 1. The ground truth rating, φ, is used to generate new data according to the normal model (11) with σ2 = 5.' It also specifies 'We choose ξ = 693, so that the number of pairwise comparisons (games played) is doubled.'