Explicit and Implicit Graduated Optimization in Deep Neural Networks

Authors: Naoki Sato, Hideaki Iiduka

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper experimentally evaluates the performance of the explicit graduated optimization algorithm with an optimal noise scheduling derived from a previous study and discusses its limitations. The evaluation uses traditional benchmark functions and empirical loss functions for modern neural network architectures. In addition, this paper extends the implicit graduated optimization algorithm, which is based on the fact that stochastic noise in the optimization process of SGD implicitly smooths the objective function, to SGD with momentum, analyzes its convergence, and demonstrates its effectiveness through experiments on image classification tasks with Res Net architectures.
Researcher Affiliation Academia Naoki Sato, Hideaki Iiduka Meiji University EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Explicit Graduated Optimization Algorithm 2: SGD with constant learning rate Algorithm 3: Implicit Graduated Optimization with SGD Algorithm 4: Stochastic Heavy Ball (SHB) Algorithm 5: Normalized Stochastic Heavy Ball (NSHB) Algorithm 6: Implicit Graduated Optimization with SHB
Open Source Code Yes Code https://github.com/iiduka-researches/igo-aaai25
Open Datasets Yes Figure 2 shows the results of using Algorithm 1 to train Res Net18 on the CIFAR100 dataset for 200 epochs. Figure 4 plots the accuracy in testing and the loss function value in training Res Net18 on the CIFAR100 dataset with SHB versus the number of epochs. ...we trained Res Net34 (He et al. 2016) on the Image Net dataset (Deng et al. 2009) with SHB for 100 epochs.
Dataset Splits Yes Figure 2 shows the results of using Algorithm 1 to train Res Net18 on the CIFAR100 dataset for 200 epochs. Figure 3 shows the results of training Res Net18 on the CIFAR100 dataset with Algorithm 3 with a constant learning rate and constant batch size... ...we trained Res Net34 (He et al. 2016) on the Image Net dataset (Deng et al. 2009) with SHB for 100 epochs.
Hardware Specification Yes The experimental environment was Intel Core i9 139000KF CPU. The experimental environment for training the DNN was as follows: NVIDIA Ge Force RTX 4090 1GPU.
Software Dependencies No The paper does not explicitly mention specific software dependencies with version numbers.
Experiment Setup Yes Figure 2 shows the results of using Algorithm 1 to train Res Net18 on the CIFAR100 dataset for 200 epochs. Note that it used SGD with f St(xt + δmut) as the search direction for decreasing (δm)m [M], where ut Rd is Gaussian noise and M = 200; i.e., the noise δm was decreased each epoch. In a 200-epoch training, methods 2, 3, and 4 update the hyperparameters every epoch. In method 1, the learning rate and the batch size are fixed at 0.1 and 128, respectively. In method 2, the initial learning rate is 0.1 and the batch size is fixed at 128. In method 3, the learning rate is fixed at 0.1 and the initial batch size is 32. In method 4, the initial learning rate is 0.1 and the initial batch size is 32.