Faster Double Adaptive Gradient Methods
Authors: Feihu Huang, Yuning Luo
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct some numerical experiments to verify efficiency of our proposed methods. In this section, we conduct some experiments on image classification and language modeling tasks to verify efficiency of our proposed methods. |
| Researcher Affiliation | Academia | 1College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China 2MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing, China |
| Pseudocode | Yes | Algorithm 1: Double Adaptive SGD (2Ada SGD) Algorithm; Algorithm 2: Double Adaptive SPIDER (2Ada SPIDER) Algorithm |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for their methodology is publicly available. |
| Open Datasets | Yes | In the experiment, we conduct image classification task on CIFAR-10 (Krizhevsky, Hinton et al. 2009) and Imagenet (Deng et al. 2009) datasets, respectively. In the experiment, we conduct language modeling task on the Penn-Treebank (Marcus, Santorini, and Marcinkiewicz 1993) and Wiki Text2 (Merity et al. 2016) datasets, respectively. |
| Dataset Splits | Yes | In the experiment, we conduct image classification task on CIFAR-10 (Krizhevsky, Hinton et al. 2009) and Imagenet (Deng et al. 2009) datasets, respectively. Specifically, we train a 3-layer Convolutional Neural Network (CNN) on the CIFAR-10 dataset and train the Res Net18 (He et al. 2016) on the Imagenet dataset. Specifically, we will train a 2-layer LSTM (Hochreiter and Schmidhuber 1997) on the Penn-Treebank dataset and train a 2-layer Transformer (Vaswani 2017) on the Wiki Text2 dataset. |
| Hardware Specification | Yes | All experiments are run over a machine with Intel(R) Xeon(R) Platinum 8352V CPU and 1 Nvidia RTX 4090 GPU. |
| Software Dependencies | No | The paper mentions neural network models like CNN, ResNet18, LSTM, and Transformer, but does not specify the version numbers of any software libraries, frameworks, or programming languages used (e.g., PyTorch, TensorFlow, Python version). |
| Experiment Setup | Yes | For the learning rates and other hyper-parameters, we do grid search and report the best one for each optimizer. We set γ = 10 3, m = 50 in our 2Ada SGD algorithm, and set γ = 10 2, b = 64 in our 2Ada SPIDER algorithm. In other algorithms, we set the basic learning rate as 0.001, and the basic batchsize as 64. Here the neural network architecture of the 3-layer CNN is provided in Table 2. |