From Optimization Dynamics to Generalization Bounds via Łojasiewicz Gradient Inequality

Authors: Fusheng Liu, Haizhao Yang, Soufiane Hayou, Qianxiao Li

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental First, we design a finite sample test algorithm to verify the Uniform-LGI condition along the training path. Our numerical results suggest the Uniform-LGI condition is generally satisfied when training machine learning models... We perform experiments on the CIFAR10 dataset (first two classes).
Researcher Affiliation Academia Fusheng Liu EMAIL Institute of Data Science National University of Singapore Haizhao Yang EMAIL Department of Mathematics University of Maryland College Park Soufiane Hayou EMAIL Department of Mathematics National University of Singapore Qianxiao Li EMAIL Department of Mathematics National University of Singapore
Pseudocode Yes Algorithm 1 Finite sample test for Uniform-LGI Input: loss function L(w); a collection of parameters w(0), w(1), w(2), . . . , w(K); the optimal loss value L(w ); start point K0; step s
Open Source Code No The paper does not contain an explicit statement about the release of source code or a link to a code repository for the described methodology.
Open Datasets Yes We train three neural network models: two-layer multilayer perceptron (MLP) with width 100 (no bias) on the MNIST dataset (Le Cun et al., 1998); Res Net18 (He et al., 2016), Wide Rese Net-16-8 (Zagoruyko & Komodakis, 2016) on the CIFAR10 dataset (Krizhevsky et al., 2009).
Dataset Splits No For the experiments in Figure 3 (a) and (b), we randomly choose the sample from the whole 10000 training images with size n = 1000, 2000, . . . , 10000, where each class has the same sample size. For the experiment in Figure 3 (c), we use the whole 10000 training images. This describes sample selection for training, but does not provide explicit train/validation/test splits.
Hardware Specification No The paper mentions training neural network models (MLP, ResNet18, Wide ResNet) on MNIST and CIFAR10 datasets using SGD, but does not specify any particular hardware like GPU models, CPU types, or cloud computing environments.
Software Dependencies No The paper describes the use of neural network models and optimizers like SGD, but does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers.
Experiment Setup Yes For each experiment, we train the network using SGD (no momentum) with random shuffling, batch size 64 and fixed learning rate 0.01... For the MLP model, we stop the training with 1000 epochs... We optimize the cross entropy loss by full batch gradient descent with random initialization and leaning rate 0.01. We stop training once the training loss is less than 0.001 or epoch reaches 20000. We do not use weight decay, dropout or batch normalization.