reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Gradient Descent Learns Linear Dynamical Systems

Authors: Moritz Hardt, Tengyu Ma, Benjamin Recht

JMLR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we provide proof-of-concepts experiments on synthetic data. We will demonstrate that 1) plain SGD tends to blow up even with relatively small learning rate, especially on hard instances 2) SGD with our projection step converges with reasonably large learning rate, and with over-parameterization the ﬁnal error is competitive 3) SGD with gradient clipping has the strongest performance in terms both of the convergence speed and the ﬁnal error Our experiments suggest that the landscape of the objective function may be even nicer than what is predicted by our theoretical development.
Researcher Affiliation	Collaboration	Moritz Hardt EMAIL Department of Electrical Engineering and Computer Science University of California, Berkeley Tengyu Ma EMAIL Facebook AI Research Benjamin Recht EMAIL Department of Electrical Engineering and Computer Science University of California, Berkeley
Pseudocode	Yes	Algorithm 1 Projected stochastic gradient descent with partial loss Algorithm 2 Projected stochastic gradient descent for long sequences Algorithm 3 Back-propagation
Open Source Code	No	The paper does not contain any explicit statements about code availability, links to repositories, or mentions of code in supplementary materials.
Open Datasets	No	We generate the true system with state dimension d = 20 by randomly picking the conjugate pairs of roots of the characteristic polynomial inside the circle with radius ρ = 0.95 and randomly generating the vector C from standard normal distribution. The inputs of the dynamical model are generated from standard normal distribution with length T = 500.
Dataset Splits	No	The inputs of the dynamical model are generated from standard normal distribution with length T = 500. We note that we generate new fresh inputs and outputs at every iterations and therefore the training loss is equal to the test loss (in expectation.)
Hardware Specification	No	The paper describes the experimental setup in Section 8 "Simulations" but does not specify any hardware details like GPU/CPU models, memory, or cloud instance types.
Software Dependencies	No	The paper does not mention any specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or solvers).
Experiment Setup	Yes	We use initial learning rate 0.01 in the projected gradient descent and SGD with gradient clipping. We use batch size 100 for all experiments, and decay the learning rate at 200K and 250K iteration by a factor of 10 in all experiments.