reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Faking Interpolation Until You Make It

Authors: Alasdair Paren, Rudra P. K. Poudel, M. Pawan Kumar

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide rigorous experimentation on a range of problems. From our empirical analysis we demonstrate the effectiveness of our approach, which outperforms other single hyperparameter optimisation methods. [...] In this section we test the hypothesis that a Polyak like step size in combination with AOVs can produce high accuracy models in the non-interpolating setting. We investigate this through rigorous experiments comparing ALI-G+ against a wide range of single hyperparameter optimisation algorithms on a variety of problems.
Researcher Affiliation	Collaboration	Alasdair Paren EMAIL Department of Engineering Science University of Oxford Oxford, UK; Rudra P. K. Poudel EMAIL Cambridge Research Laboratory, Toshiba Europe Ltd, Cambridge, UK.; M. Pawan Kumar EMAIL Department of Engineering Science University of Oxford Oxford, UK.
Pseudocode	Yes	Algorithm 1 ALI-G with AOVs; Algorithm 2 ALI-G+ Algorithm
Open Source Code	Yes	1code available at https://github.com/Alasdair-P/alig_plus
Open Datasets	Yes	Specifically we use the SVHN (Netzer et al., 2011), CIFAR10, CIFAR100 (Krizhevsky, 2009) and Tiny Image Net data sets. [...] We show that our approach scales to large problems by providing results on the Image Net data set. [...] The binary classification with radial basis functions tasks use the mushrooms and ijcnn dataset from the LIBSVM library of SVM problems (Chang & Lin, 2011).
Dataset Splits	Yes	For the SVHN data set we use the split proposed in Berrada et al. (2020) resulting in 598k training, 6k validation and 26k test samples. [...] The Image Net data set (Deng et al., 2009) contains 1.2M large RGB images of various sizes split over 1000 classes. For our experiments we use the following data augmentation. All images are normalised per channel, randomly cropped to 224x224 pixels and horizontal flips are applied with probability 0.5. For validation a centre crop is used and no flips are performed.
Hardware Specification	Yes	All experiments are conducted in Py Torch (Paszke et al., 2017) and are performed on a single GPU except for the Image Net experiments that use two. [...] In Table 2 we include the run time for training a Res Net18 split across two NVIDIA TITAN XP GPUs on Image Net.
Software Dependencies	No	The paper mentions PyTorch: "All experiments are conducted in Py Torch (Paszke et al., 2017)", but does not specify a version number for it or any other software library or dependency.
Experiment Setup	Yes	A fixed batch size of 128 and an epoch budget of 200 are used for all experiments. As is common for deep learning experiments we accelerate SGD, ALIG and ALI-G+ with a Nesterov momentum of 0.9. [...] Furthermore, to save computation we i) avoid calculating f(wt) exactly and instead approximate this online during each epoch and ii) we use wk 1 T in the place of wk 1 in line 3 of Algorithm 2. [...] For SGDstep we use the learning rate schedules detailed in He et al. (2016). Note, a different learning rate schedule is suggested in this setting, again highlighting the weakness of using SGDstep where a good learning rate schedule is not known in advance. [...] For all optimisation algorithms the problem regularisation hyperparameter is selected from λ {1 3, 1 4, 1 5, 0}. ALI-G uses constraint based regularisation, (see section 3); r was selected from r {50, 100, 200, }. All other hyperparameters are left at their default values.