Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees
Authors: Nachuan Xiao, Xiaoyin Hu, Xin Liu, Kim-Chuan Toh
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the numerical performance of our analyzed Adam-family methods for training nonsmooth neural networks. All the numerical experiments in this section are conducted on a server equipped with an Intel Xeon 6342 CPU and a NVIDIA Ge Force RTX 3090 GPU, running Python 3.8 and Py Torch 1.9.0. ... We investigate the performance of these compared Adamfamily methods on training Res Net-50 (He et al., 2016) for image classification tasks on the CIFAR-10 and CIFAR-100 data sets (Krizhevsky et al., Toronto, ON, Canada, 2009). ... The numerical results are presented in Figure 1 and Figure 2. These figures demonstrate that our proposed Adam-family methods with diminishing stepsizes exhibit the same performance as the existing Adam-family methods available in Py Torch and torchoptimizer packages. ... The numerical results are presented in Figure 3. These figures indicate that our proposed SGD-C and ADAM-C converge successfully and achieve high accuracy. ... We conduct numerical experiments using the Le Net (Le Cun et al., 1998) for the classification task on the MNIST data set (Le Cun, 1998). |
| Researcher Affiliation | Academia | Nachuan Xiao EMAIL Institute of Operations Research and Analytics National University of Singapore 3 Research Link, Singapore, 117602 Xiaoyin Hu EMAIL School of Computer and Computing Science Hangzhou City University Hangzhou, China, 310015 Xin Liu EMAIL State Key Laboratory of Scientific and Engineering Computing Academy of Mathematics and Systems Science, Chinese Academy of Sciences Beijing, China, 100190 Kim-Chuan Toh EMAIL Department of Mathematics and Institute of Operations Research and Analytics National University of Singapore 10 Lower Kent Ridge Road, Singapore, 119076 |
| Pseudocode | Yes | Algorithm 1 Stochastic subgradient-based Adam for nonsmooth optimization problems. Require: Initial point x0 Rn, m0 Rn and v0 Rn +, parameters α 0, and τ1, τ2, ε > 0, and χ as a selection of stochastic subgradients; 1: Set k = 0; 2: while not terminated do 3: Independently sample sk P, and compute gk = χ(xk, sk); 4: Choose the stepsize ηk; 5: Update the momentum term by mk+1 = (1 τ1ηk)mk + τ1ηkgk; 6: Update the estimator vk+1 from gk, mk+1 and vk by vk+1 = vk τ2ηk RU(gk, mk+1, vk). 7: Compute the scaling parameters ρm,k+1 and ρv,k+1; 8: Update xk by xk+1 = xk ηk(ρv,k+1|vk+1| + ε) 1 2 (ρm,k+1mk+1 + αgk); 9: k = k + 1; 10: end while 11: Return xk. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We investigate the performance of these compared Adamfamily methods on training Res Net-50 (He et al., 2016) for image classification tasks on the CIFAR-10 and CIFAR-100 data sets (Krizhevsky et al., Toronto, ON, Canada, 2009). ... We conduct numerical experiments using the Le Net (Le Cun et al., 1998) for the classification task on the MNIST data set (Le Cun, 1998). ... We first evaluate the performance of (SGD-C) and (ADAM-C) by training languagetranslation model (Vaswani et al., 2017) on the Multi30k data set (Elliott et al., 2016). ... (c) Train perplexity on Penn Treebank with LSTM |
| Dataset Splits | No | We set the batch size to 128 for all test instances... In our numerical experiments, we set the batch size to 64 for all test instances... In all the numerical experiments, we consistently train our models for 200 epochs while employing a batch size of 128. The paper mentions batch sizes and training epochs, but does not provide specific details on how the datasets were split into training, validation, or test sets, nor does it explicitly reference any standard splits. |
| Hardware Specification | Yes | All the numerical experiments in this section are conducted on a server equipped with an Intel Xeon 6342 CPU and a NVIDIA Ge Force RTX 3090 GPU, running Python 3.8 and Py Torch 1.9.0. |
| Software Dependencies | Yes | All the numerical experiments in this section are conducted on a server equipped with an an Intel Xeon 6342 CPU and a NVIDIA Ge Force RTX 3090 GPU, running Python 3.8 and Py Torch 1.9.0. |
| Experiment Setup | Yes | We set the batch size to 128 for all test instances and select the regularization parameter ε as ε = 10 15. Furthermore, at the k-th epoch, we choose the stepsize as ηk = η0 k+1 for all the tested algorithms. Following the settings in Castera et al. (2021), we use a grid search to find a suitable initial stepsize η0 and parameters τ1, τ2 for the Adam-family methods provided in Py Torch. ... Moreover, to investigate the performance of our proposed Adam-family methods with the Nesterov momentum term, in each test instance, we choose the Nesterov momentum parameter α as 0 and 0.1, respectively. ... In our numerical experiments, we set the batch size to 64 for all test instances and select the regularization parameter ε = 10 15 for all the Adam-family methods. ... In all the numerical experiments, we consistently train our models for 200 epochs while employing a batch size of 128. |