reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Nesterov acceleration in benignly non-convex landscapes

Authors: Kanan Gupta, Stephan Wojtowytsch

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate in Figure 3 that our assumptions are locally reasonable in deep learning. We trained a fully connected neural network (with 10 layers, width 35, tanh activation) to fit labels yi at 100 randomly generated datapoints xi R12. The small dataset size allowed us to use the exact gradient and loss function instead of stochastic approximations, for a better exploration of the loss landscape. Since the closest minimizer is generally unknown, we use the gradient as a proxy and examine the convexity of ϕ(t) = L(w + tg) for w very close to the set of global minimizers of the loss function L as in (2) and g = L(w)/ L(w) . Labels were generated using a randomly initialized teacher network (with 7 layers and width 20). Student networks were trained for 10,000 epochs using stochastic gradient descent with Nesterov momentum, with learning rate η = 0.005 and momentum ρ = 0.99. Final training loss ranged between 10 12 and 10 9 across the five runs. Second derivatives were approximated using second order difference quotients ϕ (t) ϕ(t+h) 2ϕ(t)+ϕ(t h) h2 for h = 0.01. Similarly, the strong aiming parameter with respect to the global minimizer was estimated by 2 ϕ (t)t ϕ(t)+inf ϕ t2 where ϕ (t) was estimated as ϕ(t+h) ϕ(t h).
Researcher Affiliation	Academia	Kanan Gupta, Stephan Wojtowytsh Department of Mathematics, University of Pittsburgh EMAIL, EMAIL
Pseudocode	No	The paper describes algorithms such as the time-stepping scheme (5) and the AGNES scheme (10) using mathematical equations, but it does not present them in a structured 'Pseudocode' or 'Algorithm' block format.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets	No	The paper states, "We trained a fully connected neural network... to fit labels yi at 100 randomly generated datapoints xi R12. ... Labels were generated using a randomly initialized teacher network." This indicates the use of synthetically generated data without providing specific access information for a publicly available dataset.
Dataset Splits	No	The paper mentions using "100 randomly generated datapoints" but does not provide any specific information regarding training, validation, or test dataset splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only mentions training a neural network.
Software Dependencies	No	The paper mentions using "stochastic gradient descent with Nesterov momentum" and "tanh activation" but does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow) for the implementation.
Experiment Setup	Yes	Student networks were trained for 10,000 epochs using stochastic gradient descent with Nesterov momentum, with learning rate η = 0.005 and momentum ρ = 0.99. ... Second derivatives were approximated using second order difference quotients ϕ (t) ϕ(t+h) 2ϕ(t)+ϕ(t h) h2 for h = 0.01. Similarly, the strong aiming parameter with respect to the global minimizer was estimated by 2 ϕ (t)t ϕ(t)+inf ϕ t2 where ϕ (t) was estimated as ϕ(t+h) ϕ(t h).