Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions

Authors: Ashia C. Wilson, Lester Mackey, Andre Wibisono

NeurIPS 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide several examples of strongly smooth loss functions in machine learning and numerical experiments that verify our theoretical findings.
Researcher Affiliation Collaboration Ashia C. Wilson Microsoft Research EMAIL Lester Mackey Microsoft Research EMAIL Andre Wibisono Georgia Tech EMAIL
Pseudocode Yes Algorithm 1 Nesterov-style accelerated rescaled gradient descent. Require: f satisfies (13) and h satisfies Dh(x, y) 1 p x y p. 1: Set x0 = z0, Ak = (δ/p)pk(p), αk = Ak+1 Ak δ , τk = αk Ak+1 , and δ p p 1 = η 1 p 1 /2. 2: for k = 1, . . . , K do 3: xk = δτkzk + (1 δτk)yk 4: zk+1 = arg minz X αk f(xk), z + 1 δ Dh(z, zk) 5: yk+1 = xk η 1 p 1 B 1 f(xk)/ f(xk) p 2 p 1 6: return y K.
Open Source Code Yes The code for these experiments can be found here: https://github.com/aswilson07/ARGD.git.
Open Datasets No For the logistic and ℓ4 losses, we use the same code, plots, and experimental methodology of Zhang et al. [36] (including data and step-size choice), adding to it (A)RGD. The paper mentions using data from [36] but does not provide direct access information (link, DOI, repository, or explicit citation for the dataset itself).
Dataset Splits No The paper describes the data generation process but does not provide specific details on training, validation, or test dataset splits or how data was partitioned for experiments.
Hardware Specification No The paper describes numerical experiments but does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running them.
Software Dependencies No The paper mentions that code is available on GitHub but does not explicitly list software dependencies with specific version numbers within the text.
Experiment Setup No The paper mentions step-size choices and constraints (e.g., 'largest step-size was chosen subject to the algorithm not diverging'), but it does not provide specific numerical values for hyperparameters or other detailed system-level training settings used in the experiments.