From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression

Authors: Xuxing Chen, Krishna Balasubramanian, Promit Ghosal, Bhavya Kumar Agrawalla

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a comprehensive investigation into the dynamics of gradient descent using large-order constant step-sizes in the context of quadratic regression models. ... Our simulations indicate that these five phases also manifest with generic non-orthogonal data. We also empirically investigate the generalization performance when training in the various non-monotonic (and non-divergent) phases." and "4 Experimental investigations
Researcher Affiliation Academia Xuxing Chen EMAIL Department of Mathematics University of California, Davis. Krishnakumar Balasubramanian EMAIL Department of Statistics University of California, Davis. Promit Ghosal EMAIL Department of Mathematics Brandeis University. Bhavya Agrawalla EMAIL Department of Mathematics Massachusetts Institute of Technology.
Pseudocode No The paper describes the methodology using mathematical formulations and textual explanations, but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured, step-by-step procedures in a code-like format.
Open Source Code No The paper does not provide an explicit statement about the release of source code, a link to a code repository, or mention that code is available in supplementary materials for the methodology described.
Open Datasets No We generate the ground-truth matrix U Rd m where each entry is sampled from the standard normal distribution. The training data points collected in the data matrix, denoted as X Rn d, are the first n rows of a randomly generated orthogonal matrix. The labels are generated via the model in Section 3.2, i.e., yi = 1 md Pm j=1 X i uj 2 + εi where εi is scalar noise sampled from a zero-mean normal distribution, with variances equal to 0, 0.25, 1 in different experiments.
Dataset Splits Yes We set d = 100, m {5, 10, 25}, n = 80. We generate the ground-truth matrix U Rd m where each entry is sampled from the standard normal distribution. The training data points collected in the data matrix, denoted as X Rn d, are the first n rows of a randomly generated orthogonal matrix. ... We also generate 500 data points from the same distribution for testing.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models, memory, or cloud instance types.
Software Dependencies No The paper does not mention specific software dependencies, such as libraries or frameworks with version numbers, that were used for implementing the experimental simulations.
Experiment Setup Yes We set d = 100, m {5, 10, 25}, n = 80. ... We set the step-size η such that max1 i n ai defined in Theorem 3.2 belongs to the intervals of the first four phases. In particular, we choose 0.3, 0.9, 1, 1.2, 1.8 for m = 5, 10 and 0.3, 0.9, 1, 1.2, 1.6 for m = 25. ... The labels are generated via the model in Section 3.2, i.e., yi = 1 md Pm j=1 X i uj 2 + εi where εi is scalar noise sampled from a zero-mean normal distribution, with variances equal to 0, 0.25, 1 in different experiments.