A Stochastic Proximal Polyak Step Size
Authors: Fabian Schaipp, Robert M. Gower, Michael Ulbrich
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we develop Prox SPS... Furthermore for image classification tasks, Prox SPS performs as well as Adam W with little to no tuning... We also provide an extensive convergence analysis for Prox SPS... We perform a series of experiments comparing Prox SPS, SPS, SGD and Adam W when using ℓ2-regularization. |
| Researcher Affiliation | Academia | Fabian Schaipp EMAIL Department of Mathematics Technical University of Munich Robert M. Gower rgower@flatironinstitute.org Center for Computational Mathematics Flatiron Institute, New York Michael Ulbrich EMAIL Department of Mathematics Technical University of Munich |
| Pseudocode | Yes | Algorithm 1 SPS Algorithm 2 Prox SPS Algorithm 3 Prox SPS for ϕ = λ/2 x^2 |
| Open Source Code | Yes | The code for our experiments and an implementation of Prox SPS can be found at https://github.com/fabian-sp/Prox SPS. |
| Open Datasets | Yes | We also show similar results for image classification over the CIFAR10 and Imagenet32 dataset... We train a Res Net56 and Res Net110 model (He et al., 2016) on the CIFAR10 dataset... Further, we trained a Res Net110 with batch norm on the Imagenet32 dataset. |
| Dataset Splits | Yes | The CIFAR10 dataset consists of 60,000 images... We use the Py Torch split into 50,000 training and 10,000 test examples and use a batch size of 128. ...We split the nonzero measurements into a training set of size |T| = 44926 ~0.8 * 56158 and the rest as a validation set. |
| Hardware Specification | No | The paper does not explicitly state the specific hardware used for running its experiments, such as GPU or CPU models. It mentions training ResNet models and using PyTorch, which implies the use of computational hardware, but lacks specific details. |
| Software Dependencies | No | For all experiments we use Py Torch (Paszke et al., 2019). The text mentions a software dependency (PyTorch) but does not provide a specific version number. |
| Experiment Setup | Yes | For SPS and Prox SPS we always use C(s) = 0 for all s S. For αk, we use the following schedules: constant: set αk = α0 for all k and some α0 > 0. sqrt: set αk = α0 / sqrt(j) for all iterations k during epoch j... For Adam W, we set the weight decay parameter to λ and set all other hyperparameters to its default... For SPS and Prox SPS we use the sqrt-schedule and α0 = 1. ...We use a batch size of 128... We use batch size 512. |