reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Time-Accuracy Tradeoffs in Kernel Prediction: Controlling Prediction Quality

Authors: Samory Kpotufe, Nakul Verma

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The theoretical results are validated on data from a range of real-world application domains; in particular we demonstrate that the theoretical knob performs as expected.
Researcher Affiliation	Academia	Samory Kpotufe EMAIL ORFE, Princeton University and Nakul Verma EMAIL Janelia Research Campus, HHMI
Pseudocode	Yes	Algorithm 1 below details a standard way to building an r-net oﬀline using a farthestﬁrst-traversal. This is an O(n2) procedure, but is easily implemented to handle all r > 0 simultaneously in O(n) space (notice that Qr Qr in Algorithm 1 whenever r > r ).
Open Source Code	Yes	Demo code. A Matlab implementation of the r-nets algorithm along with a test demo is available at http://www.cse.ucsd.edu/~naverma/code/rnets_prediction.zip.
Open Datasets	Yes	3. The dataset was taken from Rasmussen and Williams (2006). (SARCOS dataset mentioned in Table 1) and 4. The dataset was taken from UCI Machine Learning repository (Lichman, 2013). (CT Slices and Mini Boo NE datasets mentioned in Table 1)
Dataset Splits	Yes	We select 2,000 random samples from each dataset for testing, and use (part of) the rest for training. Training sizes are logarithmically space from 100 samples to the maximum training dataset size. For each training size, results are averaged over 5 draws of training and testing samples. ... The bandwidth parameter for each procedure (...) were selected using a 5-fold cross validation (over the training sample).
Hardware Specification	No	No specific hardware details (like GPU/CPU models, processor types, or memory amounts) are provided in the paper. It only mentions using 'Matlab' for implementation and 'fast nearest neighbor search mechanism'.
Software Dependencies	No	The paper mentions using 'Matlab' and its default fast rangesearch functionality but does not specify a version number for Matlab or any other software dependencies with version information.
Experiment Setup	Yes	We consider increasing settings of the trade-oﬀknob α from 1/6, 2/6, . . . , 6/6... We use the triangular kernel (K(u) = (1 \|u\|)+) for all our experiments... The bandwidth parameter for each procedure (...) were selected using a 5-fold cross validation (over the training sample). Bandwidth range. For a good range of bandwidth (for any procedure) we use a two step approach: we ﬁrst approximate a good bandwidth choice h1 by iterating over 10 sizes ranging from minimum training-data diameter to maximum training-data diameter (equally spaced). We then use h1 to get a reﬁned range to search, namely [h1/2, 2h1], over which we do a full sweep of 100 equally spaced bandwidth values.