Path Length Bounds for Gradient Descent and Flow

Authors: Chirag Gupta, Sivaraman Balakrishnan, Aaditya Ramdas

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Appendix G.1 we simulate the lower bound constructed in the proof of Theorem 18 with GD. This simulation allows us to verify that the dependence of the path length of the lower bound construction is indeed Ω(κ1/4/ log κ) (and not something larger like Ω( κ)). We observed that the dependence is 3κ1/4/ log κ. Thus the dependence on κ is correct, but the constants are loose. Observe that the function constructed above is not strongly convex. Thus proving a poly(κ) lower bound for strongly convex functions remains an open problem. In Appendix G.2 we simulate the quadratic lower bound constructed in the proof of Theorem 18 and compare the lower bound and upper bound (Theorem 10) with the empirically observed path length.
Researcher Affiliation Academia Chirag Gupta EMAIL Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15217, USA Sivaraman Balakrishnan EMAIL Department of Statistics and Data Science Carnegie Mellon University Pittsburgh, PA 15217, USA Aaditya Ramdas EMAIL Department of Statistics and Data Science, Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15217, USA
Pseudocode No The paper describes algorithms like Gradient flow (GF): xt = f(xt) and Gradient descent (GD): xk+1 = xk η f(xk) through mathematical equations. However, it does not present these or any other procedures in a structured pseudocode block or a clearly labeled algorithm section.
Open Source Code No The paper does not contain an explicit statement about the release of source code for the methodology described, nor does it provide a link to a code repository. It mentions simulations in the appendix but not code availability.
Open Datasets No The paper is theoretical, focusing on mathematical bounds for optimization algorithms. It does not use any external datasets for its analysis or simulations; instead, it constructs specific functions for lower bound proofs and simulations. Therefore, no public or open dataset information is provided.
Dataset Splits No The paper does not utilize any datasets; therefore, there is no information provided regarding training, test, or validation splits.
Hardware Specification No The paper describes simulations conducted in Appendix G (
Software Dependencies No The paper mentions "We use MATLAB’s numerical integration function integral, with the Abs Tol parameter set to 10-50" in Appendix G.2. While MATLAB is named, a specific version number is not provided, making it not fully reproducible according to the criteria.
Experiment Setup Yes In Appendix G.1: "For each of these functions with different κ values, the step-size for GD and the initialization point are set as described in Section E.2. The path length is computed by adding the lengths of all the updates, until xk 2 10-6. At this point the remaining distance to the origin xk 2 is added to the path length computation. Finally, µ is computed as the effective PKL constant from the iterates actually seen while running the simulation: µ = max t {0,1,...k} f(xt) 2 / 2(f(xt) f )." In Appendix G.2: "The GF path length is computed by performing the path length integral numerically (since we know xt at each point in the case of quadratics). We use MATLAB’s numerical integration function integral, with the Abs Tol parameter set to 10-50. The GD path length is computed by simulating GD with step size η = 1/2a1 and adding the lengths of all the updates, until every component except the last one is smaller than 10-2, ie maxi =d(xk)(i) < 10-2."