On Mixup Regularization
Authors: Luigi Carratino, Moustapha Cissé, Rodolphe Jenatton, Jean-Philippe Vert
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We corroborate our theoretical analysis with experiments that support our conclusions. Keywords: regularization, generalization, approximation, deep learning, mixup |
| Researcher Affiliation | Collaboration | Luigi Carratino EMAIL Ma LGa University of Genova, Italy Moustapha Cissé EMAIL Google Research Brain team, Accra Rodolphe Jenatton EMAIL Google Research Brain team, Berlin Jean-Philippe Vert EMAIL Google Research Brain team, Paris |
| Pseudocode | Yes | Algorithm 1 python code to evaluate according to (22) functions learned with mixup |
| Open Source Code | Yes | Codes for reproducing results for Image Net and various OOD variants are available at https://github. com/google/uncertainty-baselines/tree/master/baselines/imagenet |
| Open Datasets | Yes | We provide empirical support for our interpretation of Mixup regularization. All details about experiments are provided in Appendix B, together with other experiments on the simpler setting of learning on the two-moon dataset with random features. To support our discussion, we provide empirical results on CIFAR-10/100 and Image Net for different networks (Le Net, Res Net-34/50). |
| Dataset Splits | Yes | Data generation. To generate the data we use the sklearn.datasets.make_moons function from the scikit-learn library. We create n = 300 points with noise= 0.01, and split them in 50% for train and 50% for test. |
| Hardware Specification | No | The paper does not mention specific hardware components such as GPU models (e.g., NVIDIA A100, RTX 2080 Ti), CPU models (e.g., Intel Xeon, AMD Ryzen), or TPU versions used for running the experiments. It only refers to networks like Res Net-34/50. |
| Software Dependencies | No | The paper mentions software like 'scipy.special', 'torch', and 'scikit-learn' but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | Appendix B.1 CIFAR-10 and CIFAR-100: For both architectures the optimizer used for training is SGD with momentum 0.9 for 200 epochs with mini-batch size 128, weight-decay 5 10 4. For Res Net-34 the learning rate is 0.1 reduced by a factor 10 at epoch 60, 120, 160. For Le Net the learning rate is 0.01 reduced by a factor 10 at epoch 100. Appendix B.2 Image Net: The optimizer used for training is SGD with Nesterov and momentum 0.9 for 200 epochs with mini-batch size 4096 (32 numbers of cores 128 per core batch-size), weight-decay 5 10 5. The learning rate is 1.6 reduced by a factor 10 at epoch 66, 133, 177. Appendix B.3 Two Moons with Random Features: Optimization. To minimize any functional we use stochastic gradient descent with minibatching, with mini-batch size b = 50 and step-size γ = 5. Mixup hyperparameter. We consider the Beta distribution in Mixup and its approximation to be Beta(α, α) with α = 1. |