Gradients Weights improve Regression and Classification

Authors: Samory Kpotufe, Abdeslam Boularias, Thomas Schultz, Kyoungok Kim

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the theoretical intuition in extensive experiments on many real-world datasets in Section 5. The resulting instantiations of GW evaluate successfully in practice as shown in Section 5.
Researcher Affiliation Academia Samory Kpotufe EMAIL Princeton University, Princeton, NJ Abdeslam Boularias EMAIL Rutgers University, New Brunswick, NJ Thomas Schultz EMAIL University of Bonn, Germany Kyoungok Kim EMAIL Seoul National University of Science & Technology (Seoul Tech), Korea
Pseudocode No The paper describes the method (Gradient Weighting) and the estimator in textual form, e.g., 'More precisely (see Section 3) n,i has the form En |fn,h(X + tei) fn,h(X tei)| /2t', but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code and all the data sets used in these experiments are publicly available at http://goo.gl/bCfS78
Open Datasets Yes The code and all the data sets used in these experiments are publicly available at http://goo.gl/bCfS78. We consider kernel, k-NN and SVM (support vector) approaches on a variety of controlled (artificial) and real-world datasets. The other data sets are taken from the UCI repository (Frank and Asuncion, 2012) and from (Torgo, 2012). The covertype data set... taken from the UCI repository (Frank and Asuncion, 2012) and from the LIBSVM website (Fan, 2012).
Dataset Splits Yes We use 1000 training points in the robotic, Telecom, Parkinson s, and Ailerons data sets, and 2000 training points in Wine Quality, 730 in Concrete Strength, and 300 in Housing. We used 2000 test points in all of the problems, except for Concrete, 300 points, Housing, 200 points, and Robot Grasping, 10000 points. Averages over 10 random experiments are reported. For all data sets, we normalize each coordinate with its standard deviation from the training data. To learn the metric, we set h by cross-validation on half the training points. The parameter k in k-NN, k-NN-ρ, k-NN-ρ2, and the bandwidth in KR, KR-ρ, KR-ρ2 are learned by cross-validation on half of the training points. All classification experiments are performed using 2000 points for testing and up to 3000 points for learning.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions general software like 'kernel regression', 'k-NN', 'SVM', and tools like 'cover-tree of (Beygelzimer et al., 2006)', but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes In the majority of experiments (reported in the main body of the paper) we tune h, but we don t tune t and simply set t = h/2 as a rule of thumb. For all data sets, we normalize each coordinate with its standard deviation from the training data. The parameter k in k-NN, k-NN-ρ, k-NN-ρ2, and the bandwidth in KR, KR-ρ, KR-ρ2 are learned by cross-validation on half of the training points. We try the same range of k (from 1 to 5 log n) for the three k-NN methods (k-NN, k-NN-ρ). We try the same range of bandwidth/spacediameter h (a grid of size 0.02 from 1 to 0.02 ) for the three KR methods. The probability P(Ci|x) of each class Ci, used for calculating the feature weights, is estimated by weighted k-NN with Gaussian kernel. Parameter t is set proportionally to the difference between the minimum and the maximum values of each feature to account for the differences between features scales that remain after normalization.