Explicit and Implicit Graduated Optimization in Deep Neural Networks
Authors: Naoki Sato, Hideaki Iiduka
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper experimentally evaluates the performance of the explicit graduated optimization algorithm with an optimal noise scheduling derived from a previous study and discusses its limitations. The evaluation uses traditional benchmark functions and empirical loss functions for modern neural network architectures. In addition, this paper extends the implicit graduated optimization algorithm, which is based on the fact that stochastic noise in the optimization process of SGD implicitly smooths the objective function, to SGD with momentum, analyzes its convergence, and demonstrates its effectiveness through experiments on image classification tasks with Res Net architectures. |
| Researcher Affiliation | Academia | Naoki Sato, Hideaki Iiduka Meiji University EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Explicit Graduated Optimization Algorithm 2: SGD with constant learning rate Algorithm 3: Implicit Graduated Optimization with SGD Algorithm 4: Stochastic Heavy Ball (SHB) Algorithm 5: Normalized Stochastic Heavy Ball (NSHB) Algorithm 6: Implicit Graduated Optimization with SHB |
| Open Source Code | Yes | Code https://github.com/iiduka-researches/igo-aaai25 |
| Open Datasets | Yes | Figure 2 shows the results of using Algorithm 1 to train Res Net18 on the CIFAR100 dataset for 200 epochs. Figure 4 plots the accuracy in testing and the loss function value in training Res Net18 on the CIFAR100 dataset with SHB versus the number of epochs. ...we trained Res Net34 (He et al. 2016) on the Image Net dataset (Deng et al. 2009) with SHB for 100 epochs. |
| Dataset Splits | Yes | Figure 2 shows the results of using Algorithm 1 to train Res Net18 on the CIFAR100 dataset for 200 epochs. Figure 3 shows the results of training Res Net18 on the CIFAR100 dataset with Algorithm 3 with a constant learning rate and constant batch size... ...we trained Res Net34 (He et al. 2016) on the Image Net dataset (Deng et al. 2009) with SHB for 100 epochs. |
| Hardware Specification | Yes | The experimental environment was Intel Core i9 139000KF CPU. The experimental environment for training the DNN was as follows: NVIDIA Ge Force RTX 4090 1GPU. |
| Software Dependencies | No | The paper does not explicitly mention specific software dependencies with version numbers. |
| Experiment Setup | Yes | Figure 2 shows the results of using Algorithm 1 to train Res Net18 on the CIFAR100 dataset for 200 epochs. Note that it used SGD with f St(xt + δmut) as the search direction for decreasing (δm)m [M], where ut Rd is Gaussian noise and M = 200; i.e., the noise δm was decreased each epoch. In a 200-epoch training, methods 2, 3, and 4 update the hyperparameters every epoch. In method 1, the learning rate and the batch size are fixed at 0.1 and 128, respectively. In method 2, the initial learning rate is 0.1 and the batch size is fixed at 128. In method 3, the learning rate is fixed at 0.1 and the initial batch size is 32. In method 4, the initial learning rate is 0.1 and the initial batch size is 32. |