reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods

Authors: Daniil Vankov, Anton Rodomanov, Angelia Nedich, Lalitha Sankar, Sebastian Stich

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	G NUMERICAL RESULTS In Fig. 1, we compare the performance of the analyzed methods for solving optimization problem (1) with a function f(x) = 1 p x p. We fix L1 = 1 and choose L0 = ( p 2 L1 )p 2 according to Example A.1. For GM, we choose stepsizes according to (9), (12) and (13). For NGM, we use time-varying coefficients βk = ˆ R k+1 with different values of ˆR { 1 2R, 2R, 10R}, which allows us to study the robustness of this method to our initial guess of the unknown initial distance to the solution. Note that, for this particular problem, the choice of ˆR = R is rather special and allows the method to find the exact solution after one iteration, so we are not considering it. We observe that, NGM and PS-GM outperform GM with stepsizes from (9), (12) and (13). This can be explained by the fact that the complexity of GM depends on the particular choice of (L0, L1), while complexity of NGM and PS-GM involves the optimal parameters L0, L1 as discussed in Section 4. Moreover, closer initial distance estimation ˆR to a true value R leads to a faster convergence of NGM to a solution. In Fig. 2, we present an experiment studying the performance of the GM with the stepsize rule (9) based on the choice of (L0, L1). For each choice of L1 {1, 2, 4, 8, 16} we set L0 = ( p 2 L1 )p 2, according to Example A.1. As expected from the theory (see the corresponding discussion at the end of Section 4), the choice of (L0, L1) pair is crucial in practice for the performance of GM and depends on a target accuracy ϵ. In Fig. 3, we conduct an experiment for accelerated methods and consider GM with stepsize (9), Algorithm 1 with T( ) being the gradient update with stepsize (9), and two variants of normalized Similar Triangles Methods (STM, and STM-Max) from Gorbunov et al. (2024). STM uses normalization by the norm of the gradient at the current point in a gradient step, while STM-Max normalizes by the largest norm of the gradient over the optimization trajectory. It is worth noticing that only STM-Max has theoretical convergence guarantees. We set L1 = 1, L0 = ( p 2 L1 )p 2 (see Example A.1) with various p. We observe that for smaller values of p, Algorithm 1 outperforms STM and STM-max.
Researcher Affiliation	Academia	Daniil Vankov Arizona State University EMAIL Anton Rodomanov CISPA EMAIL Angelia Nedić Arizona State University EMAIL Lalitha Sankar Arizona State University EMAIL Sebastian U. Stich CISPA EMAIL
Pseudocode	Yes	Algorithm 1 AGMs DR 1: Input: Initial point x0 Rd, update rule T( ). 2: v0 = x0, A0 = 0, ζ0(x) = 1 2 x v0 2. 3: for k = 0, 1, . . . do 4: yk = arg miny{f(y): y = vk + β(xk vk), β [0, 1]}. 5: xk+1 = T(yk), Mk = f(yk) 2 2[f(yk) f(xk+1)] (> 0).6 6: Find ak+1 > 0 from the equation Mka2 k+1 = Ak + ak+1. Set Ak+1 = Ak + ak+1. 7: vk+1 = arg minx Rd{ζk+1(x) := ζk(x) + ak+1[f(yk) + f(yk), x yk ]}.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide links to code repositories.
Open Datasets	No	The paper conducts numerical experiments using a synthetic function f(x) = 1/p \|x\|^p, as described in Appendix G, rather than external, publicly available datasets.
Dataset Splits	No	The paper does not use external datasets, but a synthetic function for numerical experiments. Therefore, there are no dataset splits described.
Hardware Specification	No	The paper discusses numerical results in Appendix G but does not provide any specific details about the hardware used to run the experiments.
Software Dependencies	No	The paper discusses numerical results in Appendix G but does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	G NUMERICAL RESULTS In Fig. 1, we compare the performance of the analyzed methods for solving optimization problem (1) with a function f(x) = 1 p x p. We fix L1 = 1 and choose L0 = ( p 2 L1 )p 2 according to Example A.1. For GM, we choose stepsizes according to (9), (12) and (13). For NGM, we use time-varying coefficients βk = ˆ R k+1 with different values of ˆR { 1 2R, 2R, 10R}, which allows us to study the robustness of this method to our initial guess of the unknown initial distance to the solution. Note that, for this particular problem, the choice of ˆR = R is rather special and allows the method to find the exact solution after one iteration, so we are not considering it. We observe that, NGM and PS-GM outperform GM with stepsizes from (9), (12) and (13). This can be explained by the fact that the complexity of GM depends on the particular choice of (L0, L1), while complexity of NGM and PS-GM involves the optimal parameters L0, L1 as discussed in Section 4. Moreover, closer initial distance estimation ˆR to a true value R leads to a faster convergence of NGM to a solution. In Fig. 2, we present an experiment studying the performance of the GM with the stepsize rule (9) based on the choice of (L0, L1). For each choice of L1 {1, 2, 4, 8, 16} we set L0 = ( p 2 L1 )p 2, according to Example A.1. As expected from the theory (see the corresponding discussion at the end of Section 4), the choice of (L0, L1) pair is crucial in practice for the performance of GM and depends on a target accuracy ϵ. In Fig. 3, we conduct an experiment for accelerated methods and consider GM with stepsize (9), Algorithm 1 with T( ) being the gradient update with stepsize (9), and two variants of normalized Similar Triangles Methods (STM, and STM-Max) from Gorbunov et al. (2024). STM uses normalization by the norm of the gradient at the current point in a gradient step, while STM-Max normalizes by the largest norm of the gradient over the optimization trajectory. It is worth noticing that only STM-Max has theoretical convergence guarantees. We set L1 = 1, L0 = ( p 2 L1 )p 2 (see Example A.1) with various p. We observe that for smaller values of p, Algorithm 1 outperforms STM and STM-max.