reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning to Optimize Quasi-Newton Methods

Authors: Isaac Liao, Rumen Dangovski, Jakob Nicolaus Foerster, Marin Soljacic

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally verify that our algorithm can optimize in noisy settings, and show that simpler alternatives for representing the inverse Hessians worsen performance. Lastly, we use our optimizer to train a semi-realistic deep neural network with 95k parameters at speeds comparable to those of standard neural network optimizers. We present here a number of tasks which provide experimental evidence for the theoretical results we claim, though we test a variety of optimizers on these tasks. Section 5 describes experiments on: 5.1 Noisy Quadratic Bowl, 5.2 Rosenbrock Function, 5.3 Image Generation, 5.4 Image Classification.
Researcher Affiliation	Academia	Isaac C. Liao EMAIL Research Lab for Electronics MIT; Rumen R. Dangovski EMAIL Research Lab for Electronics MIT; Jakob N. Foerster EMAIL Department of Engineering Science University of Oxford; Marin Soljačić EMAIL Research Lab for Electronics MIT
Pseudocode	Yes	Algorithm 1 Learning to Optimize During Optimization (LODO) Require: f : Rn R: Function to minimize. Require: x0 Rn: Initialization. Require: α R: Meta-learning rate (default 0.001), Require: α0 R: Initial learning rate (default 1.0), Require: 0 β < 1: Momentum (default 0.9), t 0 Start time θ0 random initialization Initialization for G neural network m0 0 Initialize momentum while not converged do xt+1 xt G(θt)mt Pick a step using G with Eqs. (1) and (2) ℓt+1 f(xt+1) Compute loss after step θt+1 θt + Adam( θtℓt+1) Tune the G model to pick better steps mt+1 βmt + (1 β) xt+1ℓt+1 Update momentum t t + 1 Increment time end while return θt
Open Source Code	No	The paper does not provide a direct link to source code or explicitly state that the code will be made available in supplementary materials or a repository.
Open Datasets	Yes	We use our optimizer to train a semi-realistic deep neural network with 95k parameters in an autoregressive image generation task similar to training a Pixel CNN (Oord et al., 2016) to generate MNIST images (Lecun et al., 1998). We conduct an experiment on image classification with Resnet18 (He et al., 2016) on CIFAR10 (Krizhevsky et al., 2009).
Dataset Splits	Yes	Right: Validation loss by step, using a subset of 64 images excluded from the training data. We use the standard Resnet setup on CIFAR10 by replacing the 7x7 convolution with a 3x3 one, and removing the maxpool in the first convolutional block. We use the standard data augmentation and a batch size of 2048.
Hardware Specification	Yes	We performed all optimization runs in Tensor Flow 2, each with 40 Intel Xeon Gold 6248 CPUs and 2 Nvidia Volta V10 GPUs.
Software Dependencies	Yes	We performed all optimization runs in Tensor Flow 2
Experiment Setup	Yes	In every experiment, we tuned the hyperparameters of each optimizer using a genetic algorithm of 10 generations and 32 individuals per generation (16 individuals per generation for the Resnet CIFAR10 task). The tuned hyperparameters can be found in Table 6. For Image Classification, we used a batch size of 2048. For Image Generation, parameters were initialized with Le Cun normal initialization.