Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster
Authors: Sharan Vaswani, Reza Babanezhad Harikandeh
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We strengthen these results and show that if the objective function satisfies a certain nonuniform smoothness condition, GD-LS can result in a faster convergence rate than GD(1/L). In particular, we prove that for convex objectives corresponding to logistic regression and multi-class classification, GD-LS can converge to the optimum at a linear rate, and hence improves over the sublinear convergence of GD(1/L). Furthermore, for non-convex objectives satisfying gradient domination (e.g. those corresponding to the softmax policy gradient in RL or generalized linear models with a logistic link function), GD-LS can match the fast convergence of algorithms tailored for these specific settings. |
| Researcher Affiliation | Collaboration | 1Simon Fraser University 2Samsung AI, Montreal. Correspondence to: Sharan Vaswani <EMAIL>, Reza Babanezhad <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 GD with Armijo Line-search (GD-LS) 1: Input: θ0, ηmax, c (0, 1), β (0, 1) 2: for t = 0, . . . , T 1 do 3: ηt ηmax 4: while f(θt ηt f(θt)) > f(θt) c ηt f(θt) 2 2 do 5: ηt ηt β 6: end while 7: ηt ηt 8: θt+1 = θt ηt f(θt) 9: end for |
| Open Source Code | No | The paper does not contain any explicit statement about providing source code or a link to a code repository. |
| Open Datasets | Yes | Comparing GD-LS with c = 1/2 and ηmax = 108 and GD(1/L) for unregularized logistic regression on the ijcnn dataset (Chang & Lin, 2011). |
| Dataset Splits | No | The paper mentions using the 'ijcnn dataset' and a 'synthetic separable dataset' but does not provide specific details on how these datasets were split into training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, used to replicate the experiments. |
| Experiment Setup | Yes | Figure 1. Comparing GD-LS with c = 1/2 and ηmax = 108 and GD(1/L) for unregularized logistic regression on the ijcnn dataset (Chang & Lin, 2011). Figure 2. Comparing GD-LS with c = 1/2, ηmax = 108 and GD(1/L) for unregularized logistic regression on a synthetic separable dataset with γ = 0.1, n = 104 and d = 200. Figure 3. Comparing GD-LS with c = 1/2, ηmax = 104 and GD(1/L) for GLM on a synthetic dataset with n = 104, d = 200, θ = 1. |