reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Deep Double Descent via Smooth Interpolation

Authors: Matteo Gamba, Erik Englesson, Mårten Björkman, Hossein Azizpour

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we quantify sharpness of ﬁt of the training data interpolated by neural network functions, by studying the loss landscape w.r.t. to the input variable locally to each training point, over volumes around cleanlyand noisily-labelled training samples, as we systematically increase the number of model parameters and training epochs. Our ﬁndings show that loss sharpness in the input space follows both modeland epoch-wise double descent, with worse peaks observed around noisy labels.
Researcher Affiliation	Academia	Matteo Gamba EMAIL KTH Royal Institute of Technology Erik Englesson EMAIL KTH Royal Institute of Technology Mårten Björkman EMAIL KTH Royal Institute of Technology Hossein Azizpour EMAIL KTH Royal Institute of Technology
Pseudocode	Yes	In this section, we provide pseudocode for the algorithm used for generating geodesic paths, used for Monte Carlo integration. Let x0 Rd denote a training point, which we use as the starting point of geodesic paths πp emanating from x0. ... Algorithm 1 Generate a geodesic path π emanating from a training point x0.
Open Source Code	Yes	1Source code to reproduce our results available at https://github.com/magamba/double_descent
Open Datasets	Yes	Speciﬁcally, we train a family of Conv Nets formed by 4 convolutional stages of controlled base width [w, 2w, 4w, 8w], for w = 1, . . . , 64, on the CIFAR-10 dataset with 20% noisy training labels and on CIFAR-100. ... We train the transformer networks on the WMT 14 En-Fr task (Macháček & Bojar, 2014), as well as ISWLT 14 De-En (Cettolo et al., 2012).
Dataset Splits	Yes	To tune the training hyperparameters of all networks, a validation split of 1000 samples was drawn uniformly at random from the training split of CIFAR-10 and CIFAR-100. ... The training set of WMT 14 is reduced by randomly sampling 200k sentences, ﬁxed for all models.
Hardware Specification	Yes	Our experiments are conducted on a local cluster equipped with NVIDIA Tesla A100s with 40GB onboard memory.
Software Dependencies	Yes	All learned layers are initialized with Pytorch s default weight initialization (version 1.11.0).
Experiment Setup	Yes	All Conv Nets are trained for 4k epochs with SGD with momentum 0.9, ﬁxed learning rate 1e 3, batch size 128, and no weight decay. All learned layers are initialized with Pytorch s default weight initialization (version 1.11.0). To stabilize prolonged training in the absence of batch normalization, we use learning rate warmup: starting from a base value of 1e 4 the learning rate is linearly increased to 1e 3 during the ﬁrst 5 epochs of training, after which it remains constant at 1e 3. ... All Res Nets are trained for 4k epochs using Adam with base learning rate 1e 4, batch size 128, and no weight decay. All learned layers are initialized with Pytorch s default initialization (version 1.11.0). All residual networks are trained with data augmentation, consisting of 4 pixel random shifts, and random horizontal ﬂips.