reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Hinge-Minimax Learner for the Ensemble of Hyperplanes

Authors: Dolev Raviv, Tamir Hazan, Margarita Osadchy

JMLR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The empirical evaluation of the proposed models shows their advantage over the existing methods in a small training labeled data regime. We performed empirical evaluation of the proposed models: the K-hyperplane, the LHM, and the multi-class models.
Researcher Affiliation	Academia	Dolev Raviv EMAIL Department of Computer Science University of haifa Haifa, 31905, Israel Tamir Hazan EMAIL Faculty of Industrial Engineering and Management Technion Israel Institute of Technology Haifa, 32000, Israel Margarita Osadchy EMAIL Department of Computer Science University of haifa Haifa, 31905, Israel
Pseudocode	Yes	Algorithm 1 KHHM Training Algorithm 2 LHM Training.
Open Source Code	No	The paper mentions using third-party tools like "LIBSVM 4", "CVX optimization package 5", "Matlab Statistic toolbox", and "Mat Conv Net Vedaldi and Lenc (2015)" but does not provide any statement or link for their own implementation code.
Open Datasets	Yes	We construct the KHHM classiﬁer for 2D data to illustrate Algorithm 1. We samples 5000 data points from two highly overlapping Gaussians... The following tests were performed on a data set of letters from the UCI Machine Learning Repository (Murphy and Aha (1994)) In this test we used 397 scene categories of the SUN data base, which have at least 100 images per category (Xiao et al. (2010)). We downloaded the features from the SUN web page6 Next, we compared the LHM classiﬁer to alternative ensembles of linear classiﬁers on PASCAL VOC 2007 dataset (Everingham et al. (2010)). We used the CIFAR-10, composed of 10 categories (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck) as the source problem. For the worst case transfer learning, we picked a subset of 5 classes (train, bottle, cattle, forest, and sweet peppers) from the CIFAR-100
Dataset Splits	Yes	Each class was equally partitioned into training, validation, and test sets. For each letter, we used 100 samples for training, 250 for validation, and the rest for test (about 400 samples per letter). The data is divided into 50 training and 50 test images in 10 folds. We trained binary classiﬁers for pairs of classes from CIFAR-10 using imbalanced training sets, in which the negative class included all samples from all other classes (40,000 examples) and the positive class included a varying number of samples (140, 300, 600, 1400, 2000, 5000-all). We varied the size of the positive training set between 20, 50, 100, 250, 500(all) samples and we used all 2,000 samples of other classes as the negative training set.
Hardware Specification	No	The paper does not provide any specific hardware details like GPU models, CPU types, or detailed computer specifications used for running experiments. It mentions training a "Le Net model" but not on what specific hardware.
Software Dependencies	No	The paper mentions software components such as "LIBSVM 4", "CVX optimization package 5", "Matlab Statistic toolbox", and "Mat Conv Net Vedaldi and Lenc (2015)". However, it does not provide specific version numbers for LIBSVM, CVX, or Matlab Statistic toolbox. While Mat Conv Net is cited with a year (2015), a concrete software version number is not specified in the text.
Experiment Setup	Yes	We estimated the mean and covariance from the training data and tuned the parameters (C and γ) and the bias using the validation set. The parameters of all methods have been chosen using the validation set. LHM model was trained with 2 hidden components and 3 hyperplanes per component. We set the number of hyperplanes in each component to 2 and varied the number of components from 2 to 5. We repeated each experiment 50 times over different random subsets of training samples and random initialization of NN and averaged the results. We ﬁne-tuned the weights with a very fast training (just a handful of epochs, while training from scratch requires two orders of magnitude more training epochs).