Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Convergence of Adam Under Relaxed Assumptions

Authors: Haochuan Li, Alexander Rakhlin, Ali Jadbabaie

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we provide a rigorous proof of convergence of the Adaptive Moment Estimate (Adam) algorithm for a wide class of optimization objectives. The key to our analysis is a new proof of boundedness of gradients along the optimization trajectory of Adam, under a generalized smoothness assumption according to which the local smoothness (i.e., Hessian norm when it exists) is bounded by a sub-quadratic function of the gradient norm. Moreover, we propose a variance-reduced version of Adam with an accelerated gradient complexity of O(ϵ 3).
Researcher Affiliation Academia Haochuan Li MIT EMAIL Alexander Rakhlin MIT EMAIL Ali Jadbabaie MIT EMAIL
Pseudocode Yes Algorithm 1 ADAM
Open Source Code No The paper mentions 'Py Torch implementation' as a default choice for lambda, but does not provide a statement about releasing the authors' own code for the methodology or analysis described in this paper.
Open Datasets Yes Based on our preliminary experimental results on CIFAR-10 shown in Figure 1, the performance of Adam is not very sensitive to the choice of λ.
Dataset Splits No No specific dataset split information (exact percentages, sample counts, or detailed methodology) is provided for the CIFAR-10 dataset used in Figure 1.
Hardware Specification No No specific hardware details (exact GPU/CPU models, processor types, or memory amounts) are mentioned for the experiments, only general statements like 'training deep neural networks' and 'training transformers'.
Software Dependencies No The paper mentions 'Py Torch implementation' but does not specify any version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes Figure 1: Test errors of different models trained on CIFAR-10 using the Adam optimizer with β = 0.9, βsq = 0.999, η = 0.001 and different λs.