On Biased Stochastic Gradient Estimation
Authors: Derek Driggs, Jingwei Liang, Carola-Bibiane Schönlieb
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Numerical Experiments In this section, we present numerical experiments testing B-SAGA, B-SVRG, SARAH, and SARGE for minimizing convex, strongly convex, and non-convex objectives. We include one set of experiments comparing different values of θ in B-SAGA and B-SVRG with a fixed step size and one set comparing SARAH and SARGE to B-SAGA and B-SVRG with the best values of θ.2 |
| Researcher Affiliation | Academia | Derek Driggs EMAIL Department of Applied Mathematics and Theoretical Physics University of Cambridge Cambridge, CB3 0WA, UK Jingwei Liang EMAIL Institute of Natural Sciences and School of Mathematical Sciences Shanghai Jiao Tong University Shanghai, 200240,China Carola-Bibiane Sch onlieb EMAIL Department of Applied Mathematics and Theoretical Physics University of Cambridge Cambridge, CB3 0WA, UK |
| Pseudocode | Yes | Algorithm 1 Stochastic gradient descent framework Input: starting point x0 Rp, gradient estimator e . 1: for k = 0, 1, , T 1 do 2: Compute the stochastic gradient estimate e k at the current iterate xk. 3: Choose the step size/learning rate ηk. 4: Update xk+1: xk+1 proxηkg(xk ηk e k). (2) |
| Open Source Code | Yes | 2. See https://github.com/derekdriggs/StochOpt for MATLAB scripts reproducing these experiments. |
| Open Datasets | Yes | We consider four binary classification data sets: australian, mushrooms, phishing, and ijcnn1 from LIBSVM3. 3. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ |
| Dataset Splits | No | We consider four binary classification data sets: australian, mushrooms, phishing, and ijcnn1 from LIBSVM3. We rescale the value of the data to [ 1, 1], set β = 1/n, and set the step size to η = 1 5L. To compare performance, we use the objective function value F(xk) F(x ) is considered. The text mentions "training set" but does not detail how this set was derived from the full datasets, nor does it mention any test or validation splits. |
| Hardware Specification | No | No specific hardware details are provided in the paper for running the experiments. The paper only refers to "numerical experiments" without specifying the computational environment. |
| Software Dependencies | No | See https://github.com/derekdriggs/StochOpt for MATLAB scripts reproducing these experiments. This mentions "MATLAB scripts" but does not specify a version number for MATLAB or any specific libraries used. |
| Experiment Setup | Yes | We rescale the value of the data to [ 1, 1], set β = 1/n, and set the step size to η = 1 5L. and We found that small step sizes generally lead to stationary points with smaller objective values, so we set η = 1 5n for all our experiments. and Every test is initialized using a random vector with normally distributed i.i.d. entries, and the same starting point is used for testing each value of θ. |