Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval
Authors: Yan Shuo Tan, Roman Vershynin
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we prove that, in fact, constant step size online stochastic gradient descent (SGD) converges from arbitrary initializations for the non-smooth, non-convex amplitude squared loss objective. In this setting, online SGD is also equivalent to the randomized Kaczmarz algorithm from numerical analysis. Our analysis can easily be generalized to other single index models. It also makes use of new ideas from stochastic process theory, including the notion of a summary state space, which we believe will be of use for the broader field of non-convex optimization. |
| Researcher Affiliation | Academia | Yan Shuo Tan EMAIL Department of Statistics and Data Science National University of Singapore Singapore Roman Vershynin EMAIL Department of Mathematics University of California Irvine, CA 92697-3875, USA |
| Pseudocode | No | More formally, we form a sequence of signal estimates x(0), x(1), x(2), . . . with the update rule: x(k+1) := x(k) + η sign( a(k), x(k) )b(k) a(k), x(k) a(k). (3) We typically choose η = 1 d. The paper presents the algorithm's update rule as a mathematical equation (3) within a paragraph, not as a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain any statement or link indicating that the source code for the methodology described is publicly available. |
| Open Datasets | No | We make the following assumptions for the rest of the paper. Assumption 1 Assume the following for each positive integer k: (C) ( Gaussian 1 measurements) We have a(k) d Sd 1, where Sd 1 is the unit sphere in Rd. (D) (No noise) We have bk = | a(k), x |. The paper defines its own theoretical "measurements" and "signal vectors" based on specific assumptions (e.g., Gaussian measurements) as part of its mathematical model, rather than utilizing external, publicly available datasets for empirical evaluation. |
| Dataset Splits | No | The paper does not describe any experimental evaluation using a dataset, therefore, no dataset splits are provided. |
| Hardware Specification | No | The paper is theoretical and focuses on mathematical proofs and analysis of an algorithm's convergence properties. It does not describe any experiments that would necessitate specifying hardware. |
| Software Dependencies | No | The paper is theoretical and focuses on mathematical proofs and analysis of an algorithm. It does not describe any software dependencies or specific version numbers for implementation. |
| Experiment Setup | No | The paper is theoretical, presenting mathematical proofs and analysis of stochastic gradient descent. It does not include an empirical experimental setup, thus no hyperparameters or training configurations are provided. |