reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Gradients Weights improve Regression and Classification

Authors: Samory Kpotufe, Abdeslam Boularias, Thomas Schultz, Kyoungok Kim

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the theoretical intuition in extensive experiments on many real-world datasets in Section 5. The resulting instantiations of GW evaluate successfully in practice as shown in Section 5.
Researcher Affiliation	Academia	Samory Kpotufe EMAIL Princeton University, Princeton, NJ Abdeslam Boularias EMAIL Rutgers University, New Brunswick, NJ Thomas Schultz EMAIL University of Bonn, Germany Kyoungok Kim EMAIL Seoul National University of Science & Technology (Seoul Tech), Korea
Pseudocode	No	The paper describes the method (Gradient Weighting) and the estimator in textual form, e.g., 'More precisely (see Section 3) n,i has the form En \|fn,h(X + tei) fn,h(X tei)\| /2t', but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code and all the data sets used in these experiments are publicly available at http://goo.gl/bCfS78
Open Datasets	Yes	The code and all the data sets used in these experiments are publicly available at http://goo.gl/bCfS78. We consider kernel, k-NN and SVM (support vector) approaches on a variety of controlled (artiﬁcial) and real-world datasets. The other data sets are taken from the UCI repository (Frank and Asuncion, 2012) and from (Torgo, 2012). The covertype data set... taken from the UCI repository (Frank and Asuncion, 2012) and from the LIBSVM website (Fan, 2012).
Dataset Splits	Yes	We use 1000 training points in the robotic, Telecom, Parkinson s, and Ailerons data sets, and 2000 training points in Wine Quality, 730 in Concrete Strength, and 300 in Housing. We used 2000 test points in all of the problems, except for Concrete, 300 points, Housing, 200 points, and Robot Grasping, 10000 points. Averages over 10 random experiments are reported. For all data sets, we normalize each coordinate with its standard deviation from the training data. To learn the metric, we set h by cross-validation on half the training points. The parameter k in k-NN, k-NN-ρ, k-NN-ρ2, and the bandwidth in KR, KR-ρ, KR-ρ2 are learned by cross-validation on half of the training points. All classiﬁcation experiments are performed using 2000 points for testing and up to 3000 points for learning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions general software like 'kernel regression', 'k-NN', 'SVM', and tools like 'cover-tree of (Beygelzimer et al., 2006)', but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	In the majority of experiments (reported in the main body of the paper) we tune h, but we don t tune t and simply set t = h/2 as a rule of thumb. For all data sets, we normalize each coordinate with its standard deviation from the training data. The parameter k in k-NN, k-NN-ρ, k-NN-ρ2, and the bandwidth in KR, KR-ρ, KR-ρ2 are learned by cross-validation on half of the training points. We try the same range of k (from 1 to 5 log n) for the three k-NN methods (k-NN, k-NN-ρ). We try the same range of bandwidth/spacediameter h (a grid of size 0.02 from 1 to 0.02 ) for the three KR methods. The probability P(Ci\|x) of each class Ci, used for calculating the feature weights, is estimated by weighted k-NN with Gaussian kernel. Parameter t is set proportionally to the difference between the minimum and the maximum values of each feature to account for the differences between features scales that remain after normalization.