reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Toward a Unified Theory of Gradient Descent under Generalized Smoothness

Authors: Alexander Tyurin

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify our theoretical results by asking whether it is necessary to use the step size rule from Algorithm 1, and maybe it is sufficient to use the step size rules by Li et al. (2024a); Vankov et al. (2024) instead. We first take the function f : [0, 0.1] R { } defined as f(x) = log x log(0.1 x) (see Figure 2), which has its minimum at x = 0.05. This function is (ρ, 800, 2) smooth with ρ = 2; however, it is not (L0, L1) smooth for any L0, L1 0. Consequently, we run Algorithm 1 with ℓ(s) = 800 + 2s2, starting at x0 = 10 7, and observe that it converges4 after 75 iterations. Next, we take the step size γk = 1/(800 + 2(2f (x0))2) from (Li et al., 2024a) and observe that GD requires at least 20.000 iterations because f (x0) is huge. Finally, to verify whether the exponent 2 is necessary, we take ℓ(s) = 800 + 2s (Vankov et al., 2024). For this choice of ℓ, GD diverges.
Researcher Affiliation	Academia	1AIRI, Moscow, Russia 2Skoltech, Moscow, Russia. Correspondence to: Alexander Tyurin <EMAIL>.
Pseudocode	Yes	Algorithm 1 Gradient Descent (GD) with ℓ-Smoothness
Open Source Code	No	The paper does not provide an explicit statement of code release, a link to a repository, or mention of code in supplementary materials for the described methodology. Figure 1 shows a Python function for step size computation, but it's a code snippet within the paper, not a release statement for the entire work.
Open Datasets	No	The paper uses synthetic mathematical functions (e.g., f(x) = log x log(0.1 x) and f(x) = ex + e1 x) for its experiments, rather than external datasets, and thus does not provide access information for any public datasets.
Dataset Splits	No	The paper's experiments are conducted on synthetic mathematical functions, not datasets, so there are no dataset splits to describe.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	Yes	Figure 1: Python function to compute the step sizes {γk} using SciPy (Virtanen et al., 2020).
Experiment Setup	Yes	We first take the function f : [0, 0.1] R { } defined as f(x) = log x log(0.1 x) (see Figure 2), which has its minimum at x = 0.05. This function is (ρ, 800, 2) smooth with ρ = 2; however, it is not (L0, L1) smooth for any L0, L1 0. Consequently, we run Algorithm 1 with ℓ(s) = 800 + 2s2, starting at x0 = 10 7, and observe that it converges4 after 75 iterations. Next, we take the step size γk = 1/(800 + 2(2f (x0))2) from (Li et al., 2024a) and observe that GD requires at least 20.000 iterations because f (x0) is huge. Finally, to verify whether the exponent 2 is necessary, we take ℓ(s) = 800 + 2s (Vankov et al., 2024). For this choice of ℓ, GD diverges.4finds x such that f( x) f(x ) 10 5