ADAM Optimization with Adaptive Batch Selection
Authors: Gyu Yeol Kim, Min-hwan Oh
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Numerical Experiments: To evaluate our proposed algorithm, Adam CB, we conduct experiments using deep neural networks, including multilayer perceptrons (MLP) and convolutional neural networks (CNN), on three benchmark datasets: MNIST, Fashion MNIST, and CIFAR10. Comparisons are made with Adam, Adam X and Adam BS, with all experiments implemented in Py Torch. Performance is assessed by plotting training and test losses over epochs, with training loss calculated on the full dataset and test loss calculated on the held-out validation data set. Results represent the average of five runs with different random seeds, including standard deviations. Figures 1 and 2 show that Adam CB consistently outperforms Adam, Adam X and Adam BS, demonstrating faster reductions in both training and test losses across all datasets. |
| Researcher Affiliation | Academia | Gyu Yeol Kim Seoul National University Seoul, South Korea EMAIL; Min-hwan Oh Seoul National University Seoul, South Korea EMAIL |
| Pseudocode | Yes | Algorithm 1: Adam with Combinatorial Bandit Sampling (Adam CB); Algorithm 2: Batch-Selection; Algorithm 3: Weight-Update |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code, a link to a repository, or mention of code in supplementary materials. |
| Open Datasets | Yes | To evaluate our proposed algorithm, Adam CB, we conduct experiments using deep neural networks, including multilayer perceptrons (MLP) and convolutional neural networks (CNN), on three benchmark datasets: MNIST, Fashion MNIST, and CIFAR10. |
| Dataset Splits | Yes | Performance is assessed by plotting training and test losses over epochs, with training loss calculated on the full dataset and test loss calculated on the held-out validation data set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | All experiments implemented in Py Torch. (No version number for PyTorch is specified, nor are other software dependencies mentioned with versions.) |
| Experiment Setup | Yes | Table 2: Hyper-parameters used for experiments Hyper-parameter Value Learning rate αt 0.001 Exponential decay rates for momentum β1,1, β2 0.9, 0.999 Decay rate for β1,1 for convergence guarantee λ 1-1e-8 ϵ for non-zero division 1e-8 Loss Function Cross-Entropy Batch Size K 128 exploration parameter γ 0.4 Number of epochs 10 |