Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Deep Double Descent via Smooth Interpolation
Authors: Matteo Gamba, Erik Englesson, Mårten Björkman, Hossein Azizpour
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we quantify sharpness of fit of the training data interpolated by neural network functions, by studying the loss landscape w.r.t. to the input variable locally to each training point, over volumes around cleanlyand noisily-labelled training samples, as we systematically increase the number of model parameters and training epochs. Our findings show that loss sharpness in the input space follows both modeland epoch-wise double descent, with worse peaks observed around noisy labels. |
| Researcher Affiliation | Academia | Matteo Gamba EMAIL KTH Royal Institute of Technology Erik Englesson EMAIL KTH Royal Institute of Technology Mårten Björkman EMAIL KTH Royal Institute of Technology Hossein Azizpour EMAIL KTH Royal Institute of Technology |
| Pseudocode | Yes | In this section, we provide pseudocode for the algorithm used for generating geodesic paths, used for Monte Carlo integration. Let x0 Rd denote a training point, which we use as the starting point of geodesic paths πp emanating from x0. ... Algorithm 1 Generate a geodesic path π emanating from a training point x0. |
| Open Source Code | Yes | 1Source code to reproduce our results available at https://github.com/magamba/double_descent |
| Open Datasets | Yes | Specifically, we train a family of Conv Nets formed by 4 convolutional stages of controlled base width [w, 2w, 4w, 8w], for w = 1, . . . , 64, on the CIFAR-10 dataset with 20% noisy training labels and on CIFAR-100. ... We train the transformer networks on the WMT 14 En-Fr task (Macháček & Bojar, 2014), as well as ISWLT 14 De-En (Cettolo et al., 2012). |
| Dataset Splits | Yes | To tune the training hyperparameters of all networks, a validation split of 1000 samples was drawn uniformly at random from the training split of CIFAR-10 and CIFAR-100. ... The training set of WMT 14 is reduced by randomly sampling 200k sentences, fixed for all models. |
| Hardware Specification | Yes | Our experiments are conducted on a local cluster equipped with NVIDIA Tesla A100s with 40GB onboard memory. |
| Software Dependencies | Yes | All learned layers are initialized with Pytorch s default weight initialization (version 1.11.0). |
| Experiment Setup | Yes | All Conv Nets are trained for 4k epochs with SGD with momentum 0.9, fixed learning rate 1e 3, batch size 128, and no weight decay. All learned layers are initialized with Pytorch s default weight initialization (version 1.11.0). To stabilize prolonged training in the absence of batch normalization, we use learning rate warmup: starting from a base value of 1e 4 the learning rate is linearly increased to 1e 3 during the first 5 epochs of training, after which it remains constant at 1e 3. ... All Res Nets are trained for 4k epochs using Adam with base learning rate 1e 4, batch size 128, and no weight decay. All learned layers are initialized with Pytorch s default initialization (version 1.11.0). All residual networks are trained with data augmentation, consisting of 4 pixel random shifts, and random horizontal flips. |