Hinge-Minimax Learner for the Ensemble of Hyperplanes

Authors: Dolev Raviv, Tamir Hazan, Margarita Osadchy

JMLR 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The empirical evaluation of the proposed models shows their advantage over the existing methods in a small training labeled data regime. We performed empirical evaluation of the proposed models: the K-hyperplane, the LHM, and the multi-class models.
Researcher Affiliation Academia Dolev Raviv EMAIL Department of Computer Science University of haifa Haifa, 31905, Israel Tamir Hazan EMAIL Faculty of Industrial Engineering and Management Technion Israel Institute of Technology Haifa, 32000, Israel Margarita Osadchy EMAIL Department of Computer Science University of haifa Haifa, 31905, Israel
Pseudocode Yes Algorithm 1 KHHM Training Algorithm 2 LHM Training.
Open Source Code No The paper mentions using third-party tools like "LIBSVM 4", "CVX optimization package 5", "Matlab Statistic toolbox", and "Mat Conv Net Vedaldi and Lenc (2015)" but does not provide any statement or link for their own implementation code.
Open Datasets Yes We construct the KHHM classifier for 2D data to illustrate Algorithm 1. We samples 5000 data points from two highly overlapping Gaussians... The following tests were performed on a data set of letters from the UCI Machine Learning Repository (Murphy and Aha (1994)) In this test we used 397 scene categories of the SUN data base, which have at least 100 images per category (Xiao et al. (2010)). We downloaded the features from the SUN web page6 Next, we compared the LHM classifier to alternative ensembles of linear classifiers on PASCAL VOC 2007 dataset (Everingham et al. (2010)). We used the CIFAR-10, composed of 10 categories (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck) as the source problem. For the worst case transfer learning, we picked a subset of 5 classes (train, bottle, cattle, forest, and sweet peppers) from the CIFAR-100
Dataset Splits Yes Each class was equally partitioned into training, validation, and test sets. For each letter, we used 100 samples for training, 250 for validation, and the rest for test (about 400 samples per letter). The data is divided into 50 training and 50 test images in 10 folds. We trained binary classifiers for pairs of classes from CIFAR-10 using imbalanced training sets, in which the negative class included all samples from all other classes (40,000 examples) and the positive class included a varying number of samples (140, 300, 600, 1400, 2000, 5000-all). We varied the size of the positive training set between 20, 50, 100, 250, 500(all) samples and we used all 2,000 samples of other classes as the negative training set.
Hardware Specification No The paper does not provide any specific hardware details like GPU models, CPU types, or detailed computer specifications used for running experiments. It mentions training a "Le Net model" but not on what specific hardware.
Software Dependencies No The paper mentions software components such as "LIBSVM 4", "CVX optimization package 5", "Matlab Statistic toolbox", and "Mat Conv Net Vedaldi and Lenc (2015)". However, it does not provide specific version numbers for LIBSVM, CVX, or Matlab Statistic toolbox. While Mat Conv Net is cited with a year (2015), a concrete software version number is not specified in the text.
Experiment Setup Yes We estimated the mean and covariance from the training data and tuned the parameters (C and γ) and the bias using the validation set. The parameters of all methods have been chosen using the validation set. LHM model was trained with 2 hidden components and 3 hyperplanes per component. We set the number of hyperplanes in each component to 2 and varied the number of components from 2 to 5. We repeated each experiment 50 times over different random subsets of training samples and random initialization of NN and averaged the results. We fine-tuned the weights with a very fast training (just a handful of epochs, while training from scratch requires two orders of magnitude more training epochs).