reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Training Highly Multiclass Classifiers

Authors: Maya R. Gupta, Samy Bengio, Jason Weston

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on Image Net benchmark data sets and proprietary image recognition problems with 15,000 to 97,000 classes show substantial gains in classiﬁcation accuracy compared to one-vs-all linear SVMs and Wsabie. Keywords: large-scale, classiﬁcation, multiclass, online learning, stochastic gradient
Researcher Affiliation	Industry	Maya R. Gupta EMAIL Google Inc. 1600 Amphitheatre Pkwy Mountain View, CA 94301, USA Samy Bengio EMAIL Google Inc. 1600 Amphitheatre Pkwy Mountain View, CA 94301, USA Jason Weston EMAIL Google Inc. 76 9th Avenue, New York, NY 10011 USA
Pseudocode	Yes	Table 3: Wsabie++ training (for Euclidean discriminants). Model: Training Data Pairs: (xt, Yt) for t = 1, 2, . . . , n Embedded Euclidean Discriminant: f(Wx; βg) = (βg Wx)T (βg Wx) Hyperparameters: Embedding Dimension: m Stepsize: λ R+ Margin: b R+ Depth of last violator chain: Q N Initialize: Wj,r set randomly to 1 or 1 for j = 1, 2, . . . , m, r = 1, 2, . . . , d βg = 0 for all g = 1, 2, . . . , G αg = 0 for all g = 1, 2, . . . , G αWj = 0 for all j = 1, 2, . . . , m vy+ = empty set for all y+ While Not Converged: Sample xt uniformly from {x1, . . . , xn}. Sample y+ uniformly from Yt. If \|b f(Wxt; βy+) + f(Wxt; βvq y+)\|+ > 0 for any q = 1, 2, . . . , Q, continue. Set found Violator = false. For count = 1 to G: Sample y uniformly from YC t . If \|b f(Wxt; βy+) + f(Wxt; βy )\|+ > 0, set found Violator = true and break. If found Violator = false, set vy+ to the empty set and continue. Set vy+ = y . Compute the stochastic gradients: y+ = 2(βy+ Wxt) y = 2(βy Wxt) W = (βy βy+)(Wxt)T . Update the adagrad parameters: αy+ = αy+ + 1 d T y+ y+ αy = αy + 1 d T y y αWj = αWj + 1 d T Wj Wj for j = 1, 2, . . . , m. Update the classiﬁer parameters: βy+ = βy+ λ αy+ y+ βy = βy λ αy y Wj = Wj λ αWj Wj for j = 1, 2, . . . , m.
Open Source Code	No	The paper does not provide explicit statements or links for open-sourcing the code for the methodology described. It mentions that algorithms were "Implemented in C++" but does not offer access.
Open Datasets	Yes	Image Net (Deng et al., 2009) is a large image data set organized according to Word Net (Fellbaum, 1998). Concepts in Word Net, described by multiple words or word phrases, are hierarchically organized. Image Net is a growing image data set that attaches one of these concepts to each image using a quality-controlled human-veriﬁed labeling process. We used the spring 2010 and fall 2011 releases of the Imagenet data set.
Dataset Splits	Yes	For both data sets, we separated out 10% of the examples for validation, 10% for test, and the remaining 80% was used for training. The 21k Web Data contains about 9M images, divided into 20% for validation, 20% for test, and 60% for train, and the images are labelled with 21,171 distinct classes. The 97k Web Data contains about 40M images, divided into 10% for validation, 10% for test, and 80% for train, and the images are labelled with 96,812 distinct classes.
Hardware Specification	No	The paper mentions training times and implementation in C++ but does not specify any particular CPU or GPU models, memory, or other detailed hardware specifications. For example, it states: "Implemented in C++ without parallelization, all algorithms (except nearest means) took around one week to train the 16k Imagenet data set, around two weeks to train the 21k and 22k data sets, and around one month to train the 97k data set."
Software Dependencies	No	The paper states that the algorithms were "Implemented in C++ without parallelization" but does not provide any specific version numbers for C++ compilers, libraries, or other software dependencies.
Experiment Setup	Yes	Table 9: Classiﬁer parameters chosen using validation set (Includes Stepsize, Margin, Embedding dimension, # LVs for different classiﬁers and datasets). For example, Wsabie++ on 16k Image Net used Stepsize 10, Margin 10,000, Embedding dimension 192, and # LVs 8.