Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees

Authors: Nachuan Xiao, Xiaoyin Hu, Xin Liu, Kim-Chuan Toh

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the numerical performance of our analyzed Adam-family methods for training nonsmooth neural networks. All the numerical experiments in this section are conducted on a server equipped with an Intel Xeon 6342 CPU and a NVIDIA Ge Force RTX 3090 GPU, running Python 3.8 and Py Torch 1.9.0. ... We investigate the performance of these compared Adamfamily methods on training Res Net-50 (He et al., 2016) for image classification tasks on the CIFAR-10 and CIFAR-100 data sets (Krizhevsky et al., Toronto, ON, Canada, 2009). ... The numerical results are presented in Figure 1 and Figure 2. These figures demonstrate that our proposed Adam-family methods with diminishing stepsizes exhibit the same performance as the existing Adam-family methods available in Py Torch and torchoptimizer packages. ... The numerical results are presented in Figure 3. These figures indicate that our proposed SGD-C and ADAM-C converge successfully and achieve high accuracy. ... We conduct numerical experiments using the Le Net (Le Cun et al., 1998) for the classification task on the MNIST data set (Le Cun, 1998).
Researcher Affiliation Academia Nachuan Xiao EMAIL Institute of Operations Research and Analytics National University of Singapore 3 Research Link, Singapore, 117602 Xiaoyin Hu EMAIL School of Computer and Computing Science Hangzhou City University Hangzhou, China, 310015 Xin Liu EMAIL State Key Laboratory of Scientific and Engineering Computing Academy of Mathematics and Systems Science, Chinese Academy of Sciences Beijing, China, 100190 Kim-Chuan Toh EMAIL Department of Mathematics and Institute of Operations Research and Analytics National University of Singapore 10 Lower Kent Ridge Road, Singapore, 119076
Pseudocode Yes Algorithm 1 Stochastic subgradient-based Adam for nonsmooth optimization problems. Require: Initial point x0 Rn, m0 Rn and v0 Rn +, parameters α 0, and τ1, τ2, ε > 0, and χ as a selection of stochastic subgradients; 1: Set k = 0; 2: while not terminated do 3: Independently sample sk P, and compute gk = χ(xk, sk); 4: Choose the stepsize ηk; 5: Update the momentum term by mk+1 = (1 τ1ηk)mk + τ1ηkgk; 6: Update the estimator vk+1 from gk, mk+1 and vk by vk+1 = vk τ2ηk RU(gk, mk+1, vk). 7: Compute the scaling parameters ρm,k+1 and ρv,k+1; 8: Update xk by xk+1 = xk ηk(ρv,k+1|vk+1| + ε) 1 2 (ρm,k+1mk+1 + αgk); 9: k = k + 1; 10: end while 11: Return xk.
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes We investigate the performance of these compared Adamfamily methods on training Res Net-50 (He et al., 2016) for image classification tasks on the CIFAR-10 and CIFAR-100 data sets (Krizhevsky et al., Toronto, ON, Canada, 2009). ... We conduct numerical experiments using the Le Net (Le Cun et al., 1998) for the classification task on the MNIST data set (Le Cun, 1998). ... We first evaluate the performance of (SGD-C) and (ADAM-C) by training languagetranslation model (Vaswani et al., 2017) on the Multi30k data set (Elliott et al., 2016). ... (c) Train perplexity on Penn Treebank with LSTM
Dataset Splits No We set the batch size to 128 for all test instances... In our numerical experiments, we set the batch size to 64 for all test instances... In all the numerical experiments, we consistently train our models for 200 epochs while employing a batch size of 128. The paper mentions batch sizes and training epochs, but does not provide specific details on how the datasets were split into training, validation, or test sets, nor does it explicitly reference any standard splits.
Hardware Specification Yes All the numerical experiments in this section are conducted on a server equipped with an Intel Xeon 6342 CPU and a NVIDIA Ge Force RTX 3090 GPU, running Python 3.8 and Py Torch 1.9.0.
Software Dependencies Yes All the numerical experiments in this section are conducted on a server equipped with an an Intel Xeon 6342 CPU and a NVIDIA Ge Force RTX 3090 GPU, running Python 3.8 and Py Torch 1.9.0.
Experiment Setup Yes We set the batch size to 128 for all test instances and select the regularization parameter ε as ε = 10 15. Furthermore, at the k-th epoch, we choose the stepsize as ηk = η0 k+1 for all the tested algorithms. Following the settings in Castera et al. (2021), we use a grid search to find a suitable initial stepsize η0 and parameters τ1, τ2 for the Adam-family methods provided in Py Torch. ... Moreover, to investigate the performance of our proposed Adam-family methods with the Nesterov momentum term, in each test instance, we choose the Nesterov momentum parameter α as 0 and 0.1, respectively. ... In our numerical experiments, we set the batch size to 64 for all test instances and select the regularization parameter ε = 10 15 for all the Adam-family methods. ... In all the numerical experiments, we consistently train our models for 200 epochs while employing a batch size of 128.