Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Deep Double Descent via Smooth Interpolation

Authors: Matteo Gamba, Erik Englesson, Mårten Björkman, Hossein Azizpour

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we quantify sharpness of fit of the training data interpolated by neural network functions, by studying the loss landscape w.r.t. to the input variable locally to each training point, over volumes around cleanlyand noisily-labelled training samples, as we systematically increase the number of model parameters and training epochs. Our findings show that loss sharpness in the input space follows both modeland epoch-wise double descent, with worse peaks observed around noisy labels.
Researcher Affiliation Academia Matteo Gamba EMAIL KTH Royal Institute of Technology Erik Englesson EMAIL KTH Royal Institute of Technology Mårten Björkman EMAIL KTH Royal Institute of Technology Hossein Azizpour EMAIL KTH Royal Institute of Technology
Pseudocode Yes In this section, we provide pseudocode for the algorithm used for generating geodesic paths, used for Monte Carlo integration. Let x0 Rd denote a training point, which we use as the starting point of geodesic paths πp emanating from x0. ... Algorithm 1 Generate a geodesic path π emanating from a training point x0.
Open Source Code Yes 1Source code to reproduce our results available at https://github.com/magamba/double_descent
Open Datasets Yes Specifically, we train a family of Conv Nets formed by 4 convolutional stages of controlled base width [w, 2w, 4w, 8w], for w = 1, . . . , 64, on the CIFAR-10 dataset with 20% noisy training labels and on CIFAR-100. ... We train the transformer networks on the WMT 14 En-Fr task (Macháček & Bojar, 2014), as well as ISWLT 14 De-En (Cettolo et al., 2012).
Dataset Splits Yes To tune the training hyperparameters of all networks, a validation split of 1000 samples was drawn uniformly at random from the training split of CIFAR-10 and CIFAR-100. ... The training set of WMT 14 is reduced by randomly sampling 200k sentences, fixed for all models.
Hardware Specification Yes Our experiments are conducted on a local cluster equipped with NVIDIA Tesla A100s with 40GB onboard memory.
Software Dependencies Yes All learned layers are initialized with Pytorch s default weight initialization (version 1.11.0).
Experiment Setup Yes All Conv Nets are trained for 4k epochs with SGD with momentum 0.9, fixed learning rate 1e 3, batch size 128, and no weight decay. All learned layers are initialized with Pytorch s default weight initialization (version 1.11.0). To stabilize prolonged training in the absence of batch normalization, we use learning rate warmup: starting from a base value of 1e 4 the learning rate is linearly increased to 1e 3 during the first 5 epochs of training, after which it remains constant at 1e 3. ... All Res Nets are trained for 4k epochs using Adam with base learning rate 1e 4, batch size 128, and no weight decay. All learned layers are initialized with Pytorch s default initialization (version 1.11.0). All residual networks are trained with data augmentation, consisting of 4 pixel random shifts, and random horizontal flips.