reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization

Authors: Zhize Li, Jian Li

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We propose and analyze several stochastic gradient algorithms for finding stationary points or local minimum in nonconvex, possibly with nonsmooth regularizer, finite-sum and online optimization problems. First, we propose a simple proximal stochastic gradient algorithm based on variance reduction called Prox SVRG+. We provide a clean and tight analysis of Prox SVRG+, which shows that it outperforms the deterministic proximal gradient descent (Prox GD) for a wide range of minibatch sizes, hence solves an open problem proposed in Reddi et al. (2016b). Also, Prox SVRG+ uses much less proximal oracle calls than Prox SVRG (Reddi et al., 2016b) and extends to the online setting by avoiding full gradient computations. Then, we further propose an optimal algorithm, called SSRGD, based on SARAH (Nguyen et al., 2017) and show that SSRGD further improves the gradient complexity of Prox SVRG+ and achieves the the optimal upper bound, matching the known lower bound of (Fang et al., 2018; Li et al., 2021). Moreover, we show that both Prox SVRG+ and SSRGD enjoy automatic adaptation with local structure of the objective function such as the Polyak-Łojasiewicz (PL) condition for nonconvex functions in the finite-sum case, i.e., we prove that both of them can automatically switch to faster global linear convergence without any restart performed in prior work Prox SVRG (Reddi et al., 2016b). Finally, we focus on the more challenging problem of finding an (ϵ, δ)-local minimum instead of just finding an ϵ-approximate (first-order) stationary point (which may be some bad unstable saddle points). We show that SSRGD can find an (ϵ, δ)-local minimum by simply adding some random perturbations. Our algorithm is almost as simple as its counterpart for finding stationary points, and achieves similar optimal rates.
Researcher Affiliation	Academia	Zhize Li EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213, USA Jian Li EMAIL Institute for Interdisciplinary Information Sciences Tsinghua University Beijing 100084, China
Pseudocode	Yes	Algorithm 1: Prox SVRG+ Algorithm 2: SSRGD Algorithm 3: SSRGD (full version for ﬁnding approximate local minima)
Open Source Code	No	The paper does not contain any explicit statements about the release of source code for the methodology described, nor does it provide links to a code repository. The license mentioned ('License: CC-BY 4.0') refers to the publication license for the paper itself, not an open-source code release.
Open Datasets	No	The paper is theoretical and focuses on optimization algorithms for problems like "nonsmooth (composite) nonconvex optimization problems" and "standard empirical risk minimization problems." It does not describe experiments using specific datasets, nor does it provide any links, DOIs, or citations for publicly available datasets that were used or are being made available.
Dataset Splits	No	The paper is purely theoretical, analyzing optimization algorithms and their convergence properties. It does not perform experiments on specific datasets, and therefore, no information regarding dataset splits (e.g., training, testing, validation) is provided.
Hardware Specification	No	The paper is a theoretical work focusing on the proposal and analysis of stochastic gradient algorithms. It does not report on any empirical experiments or computational evaluations that would require specific hardware, and thus, no hardware specifications are mentioned.
Software Dependencies	No	The paper presents theoretical algorithms and their mathematical analysis. It does not describe any implementation details, software packages, or libraries with specific version numbers that were used in experiments.
Experiment Setup	No	This paper is theoretical in nature, focusing on the derivation and analysis of optimization algorithms rather than empirical evaluation. Consequently, it does not provide details on experimental setups, such as specific hyperparameter values, learning rates, batch sizes, or model initialization strategies.