ADAM Optimization with Adaptive Batch Selection

Authors: Gyu Yeol Kim, Min-hwan Oh

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Numerical Experiments: To evaluate our proposed algorithm, Adam CB, we conduct experiments using deep neural networks, including multilayer perceptrons (MLP) and convolutional neural networks (CNN), on three benchmark datasets: MNIST, Fashion MNIST, and CIFAR10. Comparisons are made with Adam, Adam X and Adam BS, with all experiments implemented in Py Torch. Performance is assessed by plotting training and test losses over epochs, with training loss calculated on the full dataset and test loss calculated on the held-out validation data set. Results represent the average of five runs with different random seeds, including standard deviations. Figures 1 and 2 show that Adam CB consistently outperforms Adam, Adam X and Adam BS, demonstrating faster reductions in both training and test losses across all datasets.
Researcher Affiliation Academia Gyu Yeol Kim Seoul National University Seoul, South Korea EMAIL; Min-hwan Oh Seoul National University Seoul, South Korea EMAIL
Pseudocode Yes Algorithm 1: Adam with Combinatorial Bandit Sampling (Adam CB); Algorithm 2: Batch-Selection; Algorithm 3: Weight-Update
Open Source Code No The paper does not provide an explicit statement about releasing source code, a link to a repository, or mention of code in supplementary materials.
Open Datasets Yes To evaluate our proposed algorithm, Adam CB, we conduct experiments using deep neural networks, including multilayer perceptrons (MLP) and convolutional neural networks (CNN), on three benchmark datasets: MNIST, Fashion MNIST, and CIFAR10.
Dataset Splits Yes Performance is assessed by plotting training and test losses over epochs, with training loss calculated on the full dataset and test loss calculated on the held-out validation data set.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No All experiments implemented in Py Torch. (No version number for PyTorch is specified, nor are other software dependencies mentioned with versions.)
Experiment Setup Yes Table 2: Hyper-parameters used for experiments Hyper-parameter Value Learning rate αt 0.001 Exponential decay rates for momentum β1,1, β2 0.9, 0.999 Decay rate for β1,1 for convergence guarantee λ 1-1e-8 ϵ for non-zero division 1e-8 Loss Function Cross-Entropy Batch Size K 128 exploration parameter γ 0.4 Number of epochs 10