reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Unified Approach to Controlling Implicit Regularization via Mirror Descent

Authors: Haoyuan Sun, Khashayar Gatmiry, Kwangjun Ahn, Navid Azizan

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through comprehensive experiments, we demonstrate that MD is a versatile method to produce learned models with diﬀerent regularizers, which in turn have diﬀerent generalization performances. In Section 4, we investigate the implications of our theoretical ﬁndings by applying a subclass of MD that is both eﬃcient and scalable. Our experiments involving linear models corroborate our theoretical results in Section 3, and real-world experiments with deep neural networks and popular datasets suggest that our ﬁndings carry over to such nonlinear settings.
Researcher Affiliation	Academia	Haoyuan Sun EMAIL Khashayar Gatmiry EMAIL Kwangjun Ahn EMAIL Navid Azizan EMAIL Massachusetts Institute of Technology Cambridge, MA 02139, USA
Pseudocode	Yes	Listing 1: Sample Py Torch implementation of p-GD
Open Source Code	No	To illustrate that p-GD can be easily implemented, we show a proof-of-concept implementation in Py Torch. This implementation can directly replace existing optimizers and thus require only minor changes to any existing training code. (Appendix H)
Open Datasets	Yes	Image classiﬁcation on MNIST. For a more involved example, we apply p-GD to the MNIST dataset (Le Cun et al., 1998). Speciﬁcally, we perform a set of experiments on the CIFAR-10 dataset (Krizhevsky et al., 2009). Image Net experiments. We also perform a similar set of experiments on the Image Net dataset (Russakovsky et al., 2015).
Dataset Splits	Yes	Image classiﬁcation on MNIST. For a more involved example, we apply p-GD to the MNIST dataset (Le Cun et al., 1998). For this task, we use two diﬀerent architectures: 1) a 2-layer fully connected network with 300 hidden neurons and Re LU activation, and 2) a convolutional network with two convolution layers and batch-norm. We train the fully connected network for 200 epochs and the convolution network for 50 epochs. The detailed speciﬁcation of this experiment can be found in Appendix I. For the experiments with the CIFAR-10 dataset, we adopted the example implementation from the FFCV library. For the experiments with the Image Net dataset, we used the example implementation from the FFCV library.
Hardware Specification	Yes	All of the following experiments were performed on compute nodes equipped with an Intel Skylake CPU + one Nvidia V100 GPU.
Software Dependencies	No	To illustrate that p-GD can be easily implemented, we show a proof-of-concept implementation in Py Torch. For the experiments with the CIFAR-10 dataset, we adopted the example implementation from the FFCV library. For the experiments with the Image Net dataset, we used the example implementation from the FFCV library. (No specific version numbers for PyTorch or FFCV library are provided.)
Experiment Setup	Yes	We ran p-GD with ﬁxed step size 10^-3 for 1 million steps. We used a ﬁxed step size of η = 10^-4 and ran one million iterations for diﬀerent p s. As for normalized mirror descent update (9), we use a base step size η0 = 10^-3 and scale λ = 10^-3. For the fully connected network, we train for 200 epochs in total and use a learning rate schedule that starts with η = 0.1 and decays by a factor of 5 at the 120th, 150th, and 180th epochs. For both models, we applied cross-entropy loss and batch size of 512. We used a cyclic learning rate schedule with a maximum learning rate of 0.1 and ran for 400 epochs. We used a cyclic learning rate schedule with a maximum learning rate of 0.5 and ran for 120 epochs.