Training Highly Multiclass Classifiers

Authors: Maya R. Gupta, Samy Bengio, Jason Weston

JMLR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Image Net benchmark data sets and proprietary image recognition problems with 15,000 to 97,000 classes show substantial gains in classification accuracy compared to one-vs-all linear SVMs and Wsabie. Keywords: large-scale, classification, multiclass, online learning, stochastic gradient
Researcher Affiliation Industry Maya R. Gupta EMAIL Google Inc. 1600 Amphitheatre Pkwy Mountain View, CA 94301, USA Samy Bengio EMAIL Google Inc. 1600 Amphitheatre Pkwy Mountain View, CA 94301, USA Jason Weston EMAIL Google Inc. 76 9th Avenue, New York, NY 10011 USA
Pseudocode Yes Table 3: Wsabie++ training (for Euclidean discriminants). Model: Training Data Pairs: (xt, Yt) for t = 1, 2, . . . , n Embedded Euclidean Discriminant: f(Wx; βg) = (βg Wx)T (βg Wx) Hyperparameters: Embedding Dimension: m Stepsize: λ R+ Margin: b R+ Depth of last violator chain: Q N Initialize: Wj,r set randomly to 1 or 1 for j = 1, 2, . . . , m, r = 1, 2, . . . , d βg = 0 for all g = 1, 2, . . . , G αg = 0 for all g = 1, 2, . . . , G αWj = 0 for all j = 1, 2, . . . , m vy+ = empty set for all y+ While Not Converged: Sample xt uniformly from {x1, . . . , xn}. Sample y+ uniformly from Yt. If |b f(Wxt; βy+) + f(Wxt; βvq y+)|+ > 0 for any q = 1, 2, . . . , Q, continue. Set found Violator = false. For count = 1 to G: Sample y uniformly from YC t . If |b f(Wxt; βy+) + f(Wxt; βy )|+ > 0, set found Violator = true and break. If found Violator = false, set vy+ to the empty set and continue. Set vy+ = y . Compute the stochastic gradients: y+ = 2(βy+ Wxt) y = 2(βy Wxt) W = (βy βy+)(Wxt)T . Update the adagrad parameters: αy+ = αy+ + 1 d T y+ y+ αy = αy + 1 d T y y αWj = αWj + 1 d T Wj Wj for j = 1, 2, . . . , m. Update the classifier parameters: βy+ = βy+ λ αy+ y+ βy = βy λ αy y Wj = Wj λ αWj Wj for j = 1, 2, . . . , m.
Open Source Code No The paper does not provide explicit statements or links for open-sourcing the code for the methodology described. It mentions that algorithms were "Implemented in C++" but does not offer access.
Open Datasets Yes Image Net (Deng et al., 2009) is a large image data set organized according to Word Net (Fellbaum, 1998). Concepts in Word Net, described by multiple words or word phrases, are hierarchically organized. Image Net is a growing image data set that attaches one of these concepts to each image using a quality-controlled human-verified labeling process. We used the spring 2010 and fall 2011 releases of the Imagenet data set.
Dataset Splits Yes For both data sets, we separated out 10% of the examples for validation, 10% for test, and the remaining 80% was used for training. The 21k Web Data contains about 9M images, divided into 20% for validation, 20% for test, and 60% for train, and the images are labelled with 21,171 distinct classes. The 97k Web Data contains about 40M images, divided into 10% for validation, 10% for test, and 80% for train, and the images are labelled with 96,812 distinct classes.
Hardware Specification No The paper mentions training times and implementation in C++ but does not specify any particular CPU or GPU models, memory, or other detailed hardware specifications. For example, it states: "Implemented in C++ without parallelization, all algorithms (except nearest means) took around one week to train the 16k Imagenet data set, around two weeks to train the 21k and 22k data sets, and around one month to train the 97k data set."
Software Dependencies No The paper states that the algorithms were "Implemented in C++ without parallelization" but does not provide any specific version numbers for C++ compilers, libraries, or other software dependencies.
Experiment Setup Yes Table 9: Classifier parameters chosen using validation set (Includes Stepsize, Margin, Embedding dimension, # LVs for different classifiers and datasets). For example, Wsabie++ on 16k Image Net used Stepsize 10, Margin 10,000, Embedding dimension 192, and # LVs 8.