From Optimization Dynamics to Generalization Bounds via Łojasiewicz Gradient Inequality
Authors: Fusheng Liu, Haizhao Yang, Soufiane Hayou, Qianxiao Li
TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | First, we design a finite sample test algorithm to verify the Uniform-LGI condition along the training path. Our numerical results suggest the Uniform-LGI condition is generally satisfied when training machine learning models... We perform experiments on the CIFAR10 dataset (first two classes). |
| Researcher Affiliation | Academia | Fusheng Liu EMAIL Institute of Data Science National University of Singapore Haizhao Yang EMAIL Department of Mathematics University of Maryland College Park Soufiane Hayou EMAIL Department of Mathematics National University of Singapore Qianxiao Li EMAIL Department of Mathematics National University of Singapore |
| Pseudocode | Yes | Algorithm 1 Finite sample test for Uniform-LGI Input: loss function L(w); a collection of parameters w(0), w(1), w(2), . . . , w(K); the optimal loss value L(w ); start point K0; step s |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We train three neural network models: two-layer multilayer perceptron (MLP) with width 100 (no bias) on the MNIST dataset (Le Cun et al., 1998); Res Net18 (He et al., 2016), Wide Rese Net-16-8 (Zagoruyko & Komodakis, 2016) on the CIFAR10 dataset (Krizhevsky et al., 2009). |
| Dataset Splits | No | For the experiments in Figure 3 (a) and (b), we randomly choose the sample from the whole 10000 training images with size n = 1000, 2000, . . . , 10000, where each class has the same sample size. For the experiment in Figure 3 (c), we use the whole 10000 training images. This describes sample selection for training, but does not provide explicit train/validation/test splits. |
| Hardware Specification | No | The paper mentions training neural network models (MLP, ResNet18, Wide ResNet) on MNIST and CIFAR10 datasets using SGD, but does not specify any particular hardware like GPU models, CPU types, or cloud computing environments. |
| Software Dependencies | No | The paper describes the use of neural network models and optimizers like SGD, but does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers. |
| Experiment Setup | Yes | For each experiment, we train the network using SGD (no momentum) with random shuffling, batch size 64 and fixed learning rate 0.01... For the MLP model, we stop the training with 1000 epochs... We optimize the cross entropy loss by full batch gradient descent with random initialization and leaning rate 0.01. We stop training once the training loss is less than 0.001 or epoch reaches 20000. We do not use weight decay, dropout or batch normalization. |