reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Second-Order Stochastic Optimization for Machine Learning in Linear Time

Authors: Naman Agarwal, Brian Bullins, Elad Hazan

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we present experimental evaluation for our theoretical results.7 We perform the experiments for a classiﬁcation task over two labels using the logistic regression (LR) objective function with the ℓ2 regularizer. For all of the classiﬁcation tasks we choose two values of λ: 1m , where m is the number of training examples. We perform the above classiﬁcation tasks over four data sets: MNIST, Cover Type, Mushrooms, and Real SIM. Figure 2 displays the log-error achieved by Li SSA as compared to two standard ﬁrst-order algorithms, SVRG and SAGA (Johnson and Zhang, 2013; Defazio et al., 2014), in terms of the number of passes over the data. Figure 3 presents the performance of Li SSA as compared to New Samp (Erdogdu and Montanari, 2015) and standard Newton s method with respect to both time and iterations.
Researcher Affiliation	Academia	Naman Agarwal EMAIL Computer Science Department Princeton University Princeton, NJ 08540, USA; Brian Bullins EMAIL Computer Science Department Princeton University Princeton, NJ 08540, USA; Elad Hazan EMAIL Computer Science Department Princeton University Princeton, NJ 08540, USA
Pseudocode	Yes	Algorithm 1 Li SSA: Linear (time) Stochastic Second-Order Algorithm Algorithm 2 Li SSA-Quad Algorithm 3 REPEATED HALVING Algorithm 4 Fast Quadratic Solver (FQS) Algorithm 5 ellipsoid Cover Algorithm 6 N-SVRG
Open Source Code	Yes	7. Our code for Li SSA can be found here: https://github.com/brianbullins/lissa_code.
Open Datasets	Yes	We perform the above classiﬁcation tasks over four data sets: MNIST, Cover Type, Mushrooms, and Real SIM. Table 2 provides details regarding the data sets chosen for the experiments. Data set m d Reference MNIST4-9 11791 784 Le Cun and Cortes (1998) Mushrooms 8124 112 Lichman (2013) Cover Type 100000 54 Blackard and Dean (1999) Real SIM 72309 20958 Mc Callum (1997)
Dataset Splits	No	The paper mentions scaling data points to unit norm in Section 7.1 ('To make sure our functions are scaled such that the norm of the Hessian is bounded, we scale the above data set points to unit norm.') but does not provide specific details on how the datasets (MNIST, Cover Type, Mushrooms, Real SIM) were split into training, validation, or test sets.
Hardware Specification	No	The paper describes experiments and compares performance across various algorithms and datasets, including plotting 'log(Error) as a function of the time elapsed' in Figure 4. However, it does not specify any particular hardware components such as CPU models, GPU models, memory specifications, or details of the computing environment used for these experiments.
Software Dependencies	No	The paper discusses various algorithms like SVRG, SAGA, Ada Grad, BFGS, gradient descent, and SGD, implying their implementation. However, it does not provide any specific software names with version numbers (e.g., programming languages like Python with version, or libraries/frameworks like PyTorch/TensorFlow with versions).
Experiment Setup	Yes	In Section 7.2, under 'Experiment Details' and 'Choice of Parameters for Li SSA', the paper states: 'For all of the classiﬁcation tasks we choose two values of λ: 1m , where m is the number of training examples.'; 'We set the number of inner iterations of SVRG to 2m for the case when λ = 1/m, and we parameter tune the number of inner iterations when λ = 10/m. The stepsizes for all of the algorithms are parameter tuned by an exhaustive search over the parameters.'; 'For stochastic gradient descent, we use the variable step size ηt = γ/t which is usually the prescribed step size, and we hand tune the parameter γ.'; 'To pick the parameters for our algorithm, we observe that it exhibits smooth behavior even in the case of S1 = 1, so this is used for the experiments. However, we observe that increasing S2 has a positive eﬀect on the convergence of the algorithm up to a certain point... As the theory predicts S2 to be of the order κ ln(κ), for our experiments we determine an estimate for κ and set S2 to around κ ln(κ).'