Escaping Saddle-Point Faster under Interpolation-like Conditions
Authors: Abhishek Roy, Krishnakumar Balasubramanian, Saeed Ghadimi, Prasant Mohapatra
NeurIPS 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we show that under over-parametrization several standard stochastic optimization algorithms escape saddle-points and converge to local-minimizers much faster. One of the fundamental aspects of over-parametrized models is that they are capable of interpolating the training data. We show that, under interpolation-like assumptions... the first-order oracle complexity of Perturbed Stochastic Gradient Descent (PSGD) algorithm to reach an -local-minimizer, matches the corresponding deterministic rate of O(1/ 2). |
| Researcher Affiliation | Academia | Abhiskek Roy Department of Statistics University of California, Davis EMAIL Krishnakumar Balasubramanian Department of Statistics University of California, Davis EMAIL Saeed Ghadimi Department of Management Sciences University of Waterloo EMAIL Prasant Mohapatra Department of Computer Science University of California, Davis EMAIL |
| Pseudocode | Yes | Algorithm 1 Perturbed Stochastic Gradient Descent Algorithm |
| Open Source Code | No | The paper is theoretical and does not mention releasing any source code or providing links to a code repository for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not involve empirical training on datasets. No public dataset information is provided. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical validation on datasets. No information on training, validation, or test splits is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not mention any software dependencies with specific version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any empirical experiment setup details such as hyperparameters or training configurations. |