reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees

Authors: Nachuan Xiao, Xiaoyin Hu, Xin Liu, Kim-Chuan Toh

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate the numerical performance of our analyzed Adam-family methods for training nonsmooth neural networks. All the numerical experiments in this section are conducted on a server equipped with an Intel Xeon 6342 CPU and a NVIDIA Ge Force RTX 3090 GPU, running Python 3.8 and Py Torch 1.9.0. ... We investigate the performance of these compared Adamfamily methods on training Res Net-50 (He et al., 2016) for image classiﬁcation tasks on the CIFAR-10 and CIFAR-100 data sets (Krizhevsky et al., Toronto, ON, Canada, 2009). ... The numerical results are presented in Figure 1 and Figure 2. These figures demonstrate that our proposed Adam-family methods with diminishing stepsizes exhibit the same performance as the existing Adam-family methods available in Py Torch and torchoptimizer packages. ... The numerical results are presented in Figure 3. These figures indicate that our proposed SGD-C and ADAM-C converge successfully and achieve high accuracy. ... We conduct numerical experiments using the Le Net (Le Cun et al., 1998) for the classiﬁcation task on the MNIST data set (Le Cun, 1998).
Researcher Affiliation	Academia	Nachuan Xiao EMAIL Institute of Operations Research and Analytics National University of Singapore 3 Research Link, Singapore, 117602 Xiaoyin Hu EMAIL School of Computer and Computing Science Hangzhou City University Hangzhou, China, 310015 Xin Liu EMAIL State Key Laboratory of Scientiﬁc and Engineering Computing Academy of Mathematics and Systems Science, Chinese Academy of Sciences Beijing, China, 100190 Kim-Chuan Toh EMAIL Department of Mathematics and Institute of Operations Research and Analytics National University of Singapore 10 Lower Kent Ridge Road, Singapore, 119076
Pseudocode	Yes	Algorithm 1 Stochastic subgradient-based Adam for nonsmooth optimization problems. Require: Initial point x0 Rn, m0 Rn and v0 Rn +, parameters α 0, and τ1, τ2, ε > 0, and χ as a selection of stochastic subgradients; 1: Set k = 0; 2: while not terminated do 3: Independently sample sk P, and compute gk = χ(xk, sk); 4: Choose the stepsize ηk; 5: Update the momentum term by mk+1 = (1 τ1ηk)mk + τ1ηkgk; 6: Update the estimator vk+1 from gk, mk+1 and vk by vk+1 = vk τ2ηk RU(gk, mk+1, vk). 7: Compute the scaling parameters ρm,k+1 and ρv,k+1; 8: Update xk by xk+1 = xk ηk(ρv,k+1\|vk+1\| + ε) 1 2 (ρm,k+1mk+1 + αgk); 9: k = k + 1; 10: end while 11: Return xk.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	We investigate the performance of these compared Adamfamily methods on training Res Net-50 (He et al., 2016) for image classiﬁcation tasks on the CIFAR-10 and CIFAR-100 data sets (Krizhevsky et al., Toronto, ON, Canada, 2009). ... We conduct numerical experiments using the Le Net (Le Cun et al., 1998) for the classiﬁcation task on the MNIST data set (Le Cun, 1998). ... We ﬁrst evaluate the performance of (SGD-C) and (ADAM-C) by training languagetranslation model (Vaswani et al., 2017) on the Multi30k data set (Elliott et al., 2016). ... (c) Train perplexity on Penn Treebank with LSTM
Dataset Splits	No	We set the batch size to 128 for all test instances... In our numerical experiments, we set the batch size to 64 for all test instances... In all the numerical experiments, we consistently train our models for 200 epochs while employing a batch size of 128. The paper mentions batch sizes and training epochs, but does not provide specific details on how the datasets were split into training, validation, or test sets, nor does it explicitly reference any standard splits.
Hardware Specification	Yes	All the numerical experiments in this section are conducted on a server equipped with an Intel Xeon 6342 CPU and a NVIDIA Ge Force RTX 3090 GPU, running Python 3.8 and Py Torch 1.9.0.
Software Dependencies	Yes	All the numerical experiments in this section are conducted on a server equipped with an an Intel Xeon 6342 CPU and a NVIDIA Ge Force RTX 3090 GPU, running Python 3.8 and Py Torch 1.9.0.
Experiment Setup	Yes	We set the batch size to 128 for all test instances and select the regularization parameter ε as ε = 10 15. Furthermore, at the k-th epoch, we choose the stepsize as ηk = η0 k+1 for all the tested algorithms. Following the settings in Castera et al. (2021), we use a grid search to ﬁnd a suitable initial stepsize η0 and parameters τ1, τ2 for the Adam-family methods provided in Py Torch. ... Moreover, to investigate the performance of our proposed Adam-family methods with the Nesterov momentum term, in each test instance, we choose the Nesterov momentum parameter α as 0 and 0.1, respectively. ... In our numerical experiments, we set the batch size to 64 for all test instances and select the regularization parameter ε = 10 15 for all the Adam-family methods. ... In all the numerical experiments, we consistently train our models for 200 epochs while employing a batch size of 128.