reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Learning Rates and Schrödinger Operators

Authors: Bin Shi, Weijie Su, Michael I. Jordan

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Understanding the iterative behavior of stochastic optimization algorithms for minimizing nonconvex functions remains a crucial challenge in demystifying deep learning. In particular, it is not yet understood why certain simple techniques are remarkably eﬀective for tuning the learning rate in stochastic gradient descent (SGD)... As a numerical illustration of this complexity, Figure 1 plots the error of SGD with a piecewise constant learning rate in the training of a neural network on the CIFAR-10 dataset.
Researcher Affiliation	Academia	Bin Shi EMAIL Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing, 100190, China School of Mathematical Sciences University of Chinese Academy of Sciences Beijing, 100049, China Weijie J. Su EMAIL Department of Statistics and Data Science University of Pennsylvania Philadelphia, PA 19104, USA Michael I. Jordan EMAIL Department of Electrical Engineering and Computer Sciences Department of Statistics University of California Berkeley, CA 94720, USA
Pseudocode	No	The paper describes algorithms like SGD and SGLD using mathematical equations (e.g., xk+1 = xk s e f(xk)) and continuous-time SDEs. However, it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the methodology described, nor does it provide links to any code repositories. The license information provided is for the paper itself, not for accompanying code.
Open Datasets	Yes	As a numerical illustration of this complexity, Figure 1 plots the error of SGD with a piecewise constant learning rate in the training of a neural network on the CIFAR-10 dataset. With a constant learning rate, SGD quickly reaches a plateau in terms of training error, and whenever the learning rate decreases, the plateau decreases as well, thereby yielding better optimization performance. This illustration exempliﬁes the idea of learning rate decay, a technique that is used in training deep neural networks (see, e.g., He et al., 2016; Bottou et al., 2018; Sordello and Su, 2019).
Dataset Splits	No	Figure 1 mentions training a neural network on CIFAR-10 and showing 'training error'. However, it does not specify any training/validation/test splits, their percentages, or how they were created for reproduction.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the numerical illustrations or experiments, such as GPU models, CPU types, or memory configurations. It only mentions 'Matlab2019b' in Figure 5 caption, which is software.
Software Dependencies	No	The paper mentions 'Matlab2019b' as a tool used for generating some figures ('using the noise generator state 1-10000 in Matlab2019b' in Figure 5 caption). However, it does not provide a comprehensive list of software dependencies with specific version numbers required to replicate the experiments or implement the described methodology.
Experiment Setup	Yes	Figure 1: Training error using SGD with mini-batch size 32 to train an 8-layer convolutional neural network on CIFAR-10 Krizhevsky (2009). The ﬁrst 90 epochs use a learning rate of s = 0.006, the next 120 epochs use s = 0.003, and the ﬁnal 190 epochs use s = 0.0005. Figure 3: The learning rate is set to either s = 0.1 or s = 0.05. ... The gradient noise is drawn from the standard normal distribution. All results are averaged over 10000 independent replications.