On Mixup Regularization

Authors: Luigi Carratino, Moustapha Cissé, Rodolphe Jenatton, Jean-Philippe Vert

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We corroborate our theoretical analysis with experiments that support our conclusions. Keywords: regularization, generalization, approximation, deep learning, mixup
Researcher Affiliation Collaboration Luigi Carratino EMAIL Ma LGa University of Genova, Italy Moustapha Cissé EMAIL Google Research Brain team, Accra Rodolphe Jenatton EMAIL Google Research Brain team, Berlin Jean-Philippe Vert EMAIL Google Research Brain team, Paris
Pseudocode Yes Algorithm 1 python code to evaluate according to (22) functions learned with mixup
Open Source Code Yes Codes for reproducing results for Image Net and various OOD variants are available at https://github. com/google/uncertainty-baselines/tree/master/baselines/imagenet
Open Datasets Yes We provide empirical support for our interpretation of Mixup regularization. All details about experiments are provided in Appendix B, together with other experiments on the simpler setting of learning on the two-moon dataset with random features. To support our discussion, we provide empirical results on CIFAR-10/100 and Image Net for different networks (Le Net, Res Net-34/50).
Dataset Splits Yes Data generation. To generate the data we use the sklearn.datasets.make_moons function from the scikit-learn library. We create n = 300 points with noise= 0.01, and split them in 50% for train and 50% for test.
Hardware Specification No The paper does not mention specific hardware components such as GPU models (e.g., NVIDIA A100, RTX 2080 Ti), CPU models (e.g., Intel Xeon, AMD Ryzen), or TPU versions used for running the experiments. It only refers to networks like Res Net-34/50.
Software Dependencies No The paper mentions software like 'scipy.special', 'torch', and 'scikit-learn' but does not provide specific version numbers for any of them.
Experiment Setup Yes Appendix B.1 CIFAR-10 and CIFAR-100: For both architectures the optimizer used for training is SGD with momentum 0.9 for 200 epochs with mini-batch size 128, weight-decay 5 10 4. For Res Net-34 the learning rate is 0.1 reduced by a factor 10 at epoch 60, 120, 160. For Le Net the learning rate is 0.01 reduced by a factor 10 at epoch 100. Appendix B.2 Image Net: The optimizer used for training is SGD with Nesterov and momentum 0.9 for 200 epochs with mini-batch size 4096 (32 numbers of cores 128 per core batch-size), weight-decay 5 10 5. The learning rate is 1.6 reduced by a factor 10 at epoch 66, 133, 177. Appendix B.3 Two Moons with Random Features: Optimization. To minimize any functional we use stochastic gradient descent with minibatching, with mini-batch size b = 50 and step-size γ = 5. Mixup hyperparameter. We consider the Beta distribution in Mixup and its approximation to be Beta(α, α) with α = 1.