Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Time-Accuracy Tradeoffs in Kernel Prediction: Controlling Prediction Quality

Authors: Samory Kpotufe, Nakul Verma

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The theoretical results are validated on data from a range of real-world application domains; in particular we demonstrate that the theoretical knob performs as expected.
Researcher Affiliation Academia Samory Kpotufe EMAIL ORFE, Princeton University and Nakul Verma EMAIL Janelia Research Campus, HHMI
Pseudocode Yes Algorithm 1 below details a standard way to building an r-net offline using a farthestfirst-traversal. This is an O(n2) procedure, but is easily implemented to handle all r > 0 simultaneously in O(n) space (notice that Qr Qr in Algorithm 1 whenever r > r ).
Open Source Code Yes Demo code. A Matlab implementation of the r-nets algorithm along with a test demo is available at http://www.cse.ucsd.edu/~naverma/code/rnets_prediction.zip.
Open Datasets Yes 3. The dataset was taken from Rasmussen and Williams (2006). (SARCOS dataset mentioned in Table 1) and 4. The dataset was taken from UCI Machine Learning repository (Lichman, 2013). (CT Slices and Mini Boo NE datasets mentioned in Table 1)
Dataset Splits Yes We select 2,000 random samples from each dataset for testing, and use (part of) the rest for training. Training sizes are logarithmically space from 100 samples to the maximum training dataset size. For each training size, results are averaged over 5 draws of training and testing samples. ... The bandwidth parameter for each procedure (...) were selected using a 5-fold cross validation (over the training sample).
Hardware Specification No No specific hardware details (like GPU/CPU models, processor types, or memory amounts) are provided in the paper. It only mentions using 'Matlab' for implementation and 'fast nearest neighbor search mechanism'.
Software Dependencies No The paper mentions using 'Matlab' and its default fast rangesearch functionality but does not specify a version number for Matlab or any other software dependencies with version information.
Experiment Setup Yes We consider increasing settings of the trade-offknob α from 1/6, 2/6, . . . , 6/6... We use the triangular kernel (K(u) = (1 |u|)+) for all our experiments... The bandwidth parameter for each procedure (...) were selected using a 5-fold cross validation (over the training sample). Bandwidth range. For a good range of bandwidth (for any procedure) we use a two step approach: we first approximate a good bandwidth choice h1 by iterating over 10 sizes ranging from minimum training-data diameter to maximum training-data diameter (equally spaced). We then use h1 to get a refined range to search, namely [h1/2, 2h1], over which we do a full sweep of 100 equally spaced bandwidth values.