Non-Uniform Smoothness for Gradient Descent

Authors: Albert S. Berahas, Lindon Roberts, Fred Roosta

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we provide some brief numerical experiments confirming the global linear rate for Algorithm 1 (with an appropriate choice of Rk and no tuning of the fixed stepsize η = 1) for objectives of the form f(x) = x 2p 2 as in Section 4 and f(x) = x 2p 2p as in Section 5. In all cases, we use d = 10 dimensional problems and starting point x0 = (1, . . . , 1)T and p = 1, . . . , 5.
Researcher Affiliation Academia Albert S. Berahas EMAIL Department of Industrial and Operations Engineering University of Michigan; Lindon Roberts EMAIL School of Mathematics and Statistics University of Sydney; Fred Roosta EMAIL School of Mathematics and Physics University of Queensland ARC Training Centre for Information Resilience
Pseudocode Yes Algorithm 1 Gradient Descent with LFSO. Input: Starting point x0 Rd, stepsize factor η > 0. 1: for k = 0, 1, 2, . . . do 2: Choose any Rk > 0 3: Set e Rk := max Rk, η L(xk, Rk) f(xk) . (2.3) 4: xk+1 = xk η L(xk, e Rk) f(xk). (2.4)
Open Source Code No No explicit statement or link for open-source code was found in the paper.
Open Datasets No The numerical experiments use synthetic objective functions (e.g., f(x) = x 2p 2 and f(x) = x 2p 2p) rather than actual datasets. Therefore, no information on publicly available datasets is provided.
Dataset Splits No The paper uses synthetic objective functions for numerical experiments, not traditional datasets that would require train/test/validation splits. Therefore, no dataset split information is provided.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments. It only mentions 'CPU time (s)' in the runtime performance graphs without specifying the CPU model or other hardware components.
Software Dependencies No The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup Yes In all cases, we use d = 10 dimensional problems and starting point x0 = (1, . . . , 1)T and p = 1, . . . , 5. For standard gradient descent with fixed stepsize, we get the results shown in Figure 1. To see sufficiently fast convergence, some mild tuning of the stepsize η was required. For both objectives, we see that gradient descent achieves a global linear rate for p = 1 as expected, but a clearly sublinear rate for all p > 1, again in line with expectations. When running Algorithm 1, we use Rk = g(xk) 2 = 2 x 2 for f(x) = x 2p 2 (based on the framework of Section 4) and Rk = x for f(x) = x 2p 2p (based on the framework of Section 5).