reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Reliable Effective Terascale Linear Learning System

Authors: Alekh Agarwal, Oliveier Chapelle, Miroslav Dudík, John Langford

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we will present the empirical evaluation of the system described so far. We begin by describing the datasets used, before evaluating the various properties of our framework both from a systems as well as a machine learning perspective.
Researcher Affiliation	Industry	Alekh Agarwal EMAIL Microsoft Research 641 Avenue of the Americas New York, NY 10011 Olivier Chapelle EMAIL Criteo 411 High Street Palo Alto, CA 94301 Miroslav Dudík EMAIL John Langford EMAIL Microsoft Research 641 Avenue of the Americas New York, NY 10011
Pseudocode	Yes	Algorithm 1 Stochastic gradient descent algorithm on a single node using adaptive gradient update (Duchi et al., 2010; Mc Mahan and Streeter, 2010). Require: Invariance update function s (see Karampatziakis and Langford, 2011) w = 0, G = I for all (x, y) in training set do g w ℓ(w x; y) w w s(w, x, y)G 1/2g Gjj Gjj + g2 j for all j = 1, . . . , d end for Algorithm 2 Sketch of the proposed learning architecture Require: Data split across nodes for all nodes k do wk = result of stochastic gradient descent on the data of node k using Algorithm 1. Compute the weighted average w as in (2) using All Reduce. Start a preconditioned L-BFGS optimization from w. for t = 1, . . . , T do Compute gk the (local batch) gradient of examples on node k. Compute g = Pm k=1 gk using All Reduce. Add the regularization part in the gradient. Take an L-BFGS step. end for end for
Open Source Code	Yes	Our implementation is available as part of the open source project Vowpal Wabbit (Langford et al., 2007) and is summarized in Algorithm 2.
Open Datasets	Yes	The second dataset comes from the task of recognizing a human acceptor splice site (Sonnenburg and Franc, 2010). We consider this learning task because this is, as far as we know, the largest public data set for which subsampling is not an eﬀective learning strategy.
Dataset Splits	Yes	We follow the same experimental protocol as in Sonnenburg and Franc (2010): we use the same training and test sets of respectively 50M and 4.6M samples.
Hardware Specification	No	We present a system and a set of techniques for learning linear predictors with convex losses on terascale data sets, with trillions of features,1 billions of training examples and millions of parameters in an hour using a cluster of 1000 machines.
Software Dependencies	No	Map Reduce (Dean and Ghemawat, 2008) and its open source implementation Hadoop4 have become the overwhelmingly favorite platforms for distributed data processing.
Experiment Setup	Yes	We ran 5 iterations of L-BFGS on the splice site data with 1000 nodes. On each node, we recorded for every iteration the time spent in All Reduce and the computing time deﬁned as the time not spent in All Reduce. The regularization coeﬃcient is chosen from a small set to obtain the best test performance. We used stochastic gradient descent with the learning rate in the t-th iteration as ηt = 1 / (L + γ). We tuned γ and L on a small subset of the data set. They suggest global minibatch sizes of no more than b n. On m nodes, each node accumulates gradients from b/m examples, then an All Reduce operation is carried out, yielding the mini-batch gradient, and each node performs a stochastic gradient update with the learning rate of the form...