Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster

Authors: Sharan Vaswani, Reza Babanezhad Harikandeh

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We strengthen these results and show that if the objective function satisfies a certain nonuniform smoothness condition, GD-LS can result in a faster convergence rate than GD(1/L). In particular, we prove that for convex objectives corresponding to logistic regression and multi-class classification, GD-LS can converge to the optimum at a linear rate, and hence improves over the sublinear convergence of GD(1/L). Furthermore, for non-convex objectives satisfying gradient domination (e.g. those corresponding to the softmax policy gradient in RL or generalized linear models with a logistic link function), GD-LS can match the fast convergence of algorithms tailored for these specific settings.
Researcher Affiliation Collaboration 1Simon Fraser University 2Samsung AI, Montreal. Correspondence to: Sharan Vaswani <EMAIL>, Reza Babanezhad <EMAIL>.
Pseudocode Yes Algorithm 1 GD with Armijo Line-search (GD-LS) 1: Input: θ0, ηmax, c (0, 1), β (0, 1) 2: for t = 0, . . . , T 1 do 3: ηt ηmax 4: while f(θt ηt f(θt)) > f(θt) c ηt f(θt) 2 2 do 5: ηt ηt β 6: end while 7: ηt ηt 8: θt+1 = θt ηt f(θt) 9: end for
Open Source Code No The paper does not contain any explicit statement about providing source code or a link to a code repository.
Open Datasets Yes Comparing GD-LS with c = 1/2 and ηmax = 108 and GD(1/L) for unregularized logistic regression on the ijcnn dataset (Chang & Lin, 2011).
Dataset Splits No The paper mentions using the 'ijcnn dataset' and a 'synthetic separable dataset' but does not provide specific details on how these datasets were split into training, validation, or test sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers, used to replicate the experiments.
Experiment Setup Yes Figure 1. Comparing GD-LS with c = 1/2 and ηmax = 108 and GD(1/L) for unregularized logistic regression on the ijcnn dataset (Chang & Lin, 2011). Figure 2. Comparing GD-LS with c = 1/2, ηmax = 108 and GD(1/L) for unregularized logistic regression on a synthetic separable dataset with γ = 0.1, n = 104 and d = 200. Figure 3. Comparing GD-LS with c = 1/2, ηmax = 104 and GD(1/L) for GLM on a synthetic dataset with n = 104, d = 200, θ = 1.