Adaptive backtracking for faster optimization
Authors: Joao V. Cavalcanti, Laurent Lessard, Ashia Wilson
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To conclude, we present numerical experiments on several real-world problems confirming that using online adaptive factors in line search subroutines can produce higher-quality step-sizes and significantly reduce the total number of function evaluations standard backtracking subroutines require. ... We present four experiments illustrating different ways and scenarios in which our adaptive backtracking line search (ABLS) subroutine (Algorithm 2) can outperform regular backtracking (Algorithm 1). |
| Researcher Affiliation | Academia | 1 MIT, 2 Northeastern University EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Backtracking Line Search ... Algorithm 2 Adaptive Backtracking Line Search |
| Open Source Code | No | The results presented in Fig. 4 correspond to experiment 1 from (Galli et al., 2023) without any modifications, and are run using the base code from the same paper, which can be found at: https://github.com/leonardogalli91/Po No S. This refers to a third-party codebase used for comparison, not the authors' own implementation of their proposed Adaptive Backtracking Line Search (ABLS). |
| Open Datasets | Yes | We take observation from seven datasets: A9A, GISETTE_SCALE (G_SCALE), MUSHROOMS, PHISHING and WEB-1 from LIBSVM (Chang & Lin, 2011), PROTEIN from KDD Cup 2004 (Caruana et al., 2004) and MNIST (Le Cun et al., 1998) ... We consider A observations from eight datasets: IRIS, DIGITS, WINE, OLIVETTI_FACES and LFW_PAIRS from scikit-learn (Pedregosa et al., 2011), SPEAR3 and SPEAR10 (Lorenz et al., 2014) and SPARCO (van den Berg et al., 2007). ... We take A from the Movie Lens 100K dataset (Harper & Konstan, 2015). |
| Dataset Splits | No | The paper describes the datasets used and initial points, but does not provide specific train/validation/test splits, or details on how the data was partitioned for model evaluation. The experiments focus on optimization performance (e.g., function evaluations) rather than generalization metrics on predefined splits. |
| Hardware Specification | Yes | All experiments were run on a supercomputer cluster containing Intel Xeon Platinum 8260 and Intel Xeon Gold 6248 CPUs Reuther et al. (2018). |
| Software Dependencies | No | The paper mentions using existing libraries/frameworks like LIBSVM and scikit-learn for datasets and mentions base algorithms like GD, AGD, Adagrad, and FISTA. However, it does not specify version numbers for any of the software dependencies used in their own implementation of the proposed methodology. |
| Experiment Setup | Yes | We set the starting point x0 as the origin and fix ϵ = 0.01 in (4b) on all experiments. We also fix ρ, but change it according to the base method. ... we consider four choices of initial step-sizes, α = {101, 102, 103, 104}/ L ... We adopt the standard choice c = 10 4 ... for BLS used with GD and Adagrad but, ... we choose c = 1/2 in the case of AGD. Also, we use the regularization parameter γ as the strong convexity parameter input for AGD. ... We use the origin as the initial point and 0.1 as the initial step-size. ... pick different values for initial step-sizes, {0.05, 0.5, 5, 50}, and ρ. Namely, we let ρ {0.2, 0.3, 0.5, 0.6} for the BLS variants, but fix ρ = 0.3 and ρ = 0.9 for the ABLS GD and AGD variants, respectively. |