reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Momentumized, Adaptive, Dual Averaged Gradient Method

Authors: Aaron Defazio, Samy Jelassi

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present results across a large number of problems across both categories to validate the general purpose utility of the MADGRAD approach. In our experiments we use the most common step-size reduction scheme used in the literature for each respective problem. For all algorithms, we performed a learning rate and decay sweep on a grid on intervals of [1 10i, 2.5 10i, 5 10i] for a range of i large enough to ensure the best parameters for each problem and method were considered. We present the results from the best learning rate and decay for each method when considering test set performance.
Researcher Affiliation	Collaboration	Aaron Defazio EMAIL Facebook AI Research, New York Samy Jelassi EMAIL Princeton University, Princeton
Pseudocode	Yes	Algorithm 1 MADGRAD Require: γk stepsize sequence, ck momentum sequence, initial point x0, epsilon ϵ 1: s0 : d = 0, ν0 : d = 0 2: for k = 0, . . . , T do 3: Sample ξk and set gk = f(xk, ξk) 5: sk+1 = sk + λkgk 6: νk+1 = νk + λk (gk gk) zk+1 = x0 1 3 νk+1 + ϵ sk+1 8: xk+1 = (1 ck+1) xk + ck+1zk+1. 10: return x T
Open Source Code	Yes	An implementation is available at https://github.com/facebookresearch/madgrad
Open Datasets	Yes	CIFAR10 (Krizhevsky, 2009) is an established baseline method within the deep learning community due to its manageable size and representative performance within the class of data-limited supervised image classiﬁcation problems. The Image Net problem (Krizhevsky et al., 2012) is a larger problem more representative of image classiﬁcation problems encountered in industrial applications where a large number of classes and higher resolution input images are encountered. The fast MRI Knee challenge (Zbontar et al., 2018) is a recently proposed large-scale image2-image problem. For a machine translation baseline we trained our model on the IWSLT14 Germain-to English dataset (Cettolo et al., 2014), using a popular LSTM variant introduced by Wiseman and Rush (2016). We performed our experiments using the Ro BERTa variant of BERT_BASE (Liu et al., 2019), a 110M parameter transformer model.
Dataset Splits	Yes	Following standard practice, we apply a data-augmentation step consisting of random horizontal ﬂipping, 4px padding followed by random cropping to 32px at training time only. Our setup used data preprocessing consisting of a mean [0.485, 0.456, 0.406] and std [0.229, 0.224, 0.225] normalization of the three respective color channels, followed by a Random Resized Crop Py Torch operation to reduce the resolution to 224 pixels followed by a random 50% chance of horizontal ﬂipping. For test set evaluation a resize to 256 pixels followed by a center crop to 224 pixels is used instead.
Hardware Specification	Yes	GPUs 1x V100 GPUs 8x V100
Software Dependencies	No	The Ada Grad implementations available in major deep learning frameworks (Py Torch, Tensorﬂow) contain the mirror descent form only. Our implementation used Fair Seq defaults except for the parameters listed below.
Experiment Setup	Yes	Hyper-parameter Value Architecture Pre Act Res Net152 Epochs 300 GPUs 1x V100 Batch Size per GPU 128 LR schedule 150-225 tenthing Seeds 10 Method LR Decay MADGRAD 2.5e-4 0.0001 Ada Grad 0.01 0.0001 Adam 0.00025 0.0001 SGD 0.1 0.0001