Learning Augmentation Distributions using Transformed Risk Minimization

Authors: Evangelos Chatzipantazis, Stefanos Pertigkiozoglou, Kostas Daniilidis, Edgar Dobriban

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide experiments with SCALE. Our first experiment shows that on Rot MNIST, SCALE learns the correct symmetries and outperforms previous methods; which supports our zero-gap theorem. We also perform experiments on CIFAR 10 and CIFAR 100, showing that SCALE has advantages in both accuracy and calibration compared to prior works, while maintaining the benefit of time efficiency.
Researcher Affiliation Academia Evangelos Chatzipantazis EMAIL Department of Computer and Information Science University of Pennsylvania Stefanos Pertigkiozolou EMAIL Department of Computer and Information Science University of Pennsylvania Kostas Daniilidis EMAIL Department of Computer and Information Science University of Pennsylvania Edgar Dobriban EMAIL Department Statistics of Statistics and Data Science University of Pennsylvania
Pseudocode Yes Algorithm 1 SCALE: Training for learning augmentations using TRM
Open Source Code No The paper does not contain any explicit statements or links indicating the release of open-source code for the described methodology.
Open Datasets Yes Our first experiment shows that on Rot MNIST, SCALE learns the correct symmetries and outperforms previous methods; which supports our zero-gap theorem. We also perform experiments on CIFAR 10 and CIFAR 100, showing that SCALE has advantages in both accuracy and calibration compared to prior works, while maintaining the benefit of time efficiency. Using the parametrization from Section 5.1, we train on MNIST (Le Cun et al., 1998) and rotated MNIST (rot MNIST). Using SCALE and the parametrization shown in Section 5.1, we learn useful data augmentations on the CIFAR 10 and CIFAR 100 datasets (Krizhevsky, 2009).
Dataset Splits Yes SVHN contains three sets of samples: the train set containing 73,257 difficult training samples, the extra set containing 531,131 less difficult training samples and the test set containing 26,032 test samples.
Hardware Specification Yes The experiments were executed using Ge Force RTX 2080 Ti GPUs.
Software Dependencies No We use the implementation from the Torchvision package within Py Torch (Paszke et al., 2019) with padding equal to four. This mentions PyTorch and Torchvision, but specific version numbers for these software packages as used in the experiments are not provided.
Experiment Setup Yes For MNIST/rot MNIST we use a batch size of 128, while for CIFAR 10/100 we use a batch size of 64. We initialize all αi to 0.1 and all πi to 1/K, where recall that K is the number of transformations composed. Additionally, to avoid having πi converge to zero or unity at the beginning of the training, we constraint all πi to the interval [c, 1 c]. We initialize c = 0.4/K and reduce it linearly (by a constant after each epoch) so that at the end of the training it equals zero. ... We use M = 4 to reduce the required computation. ... To optimize the network s parameters we use the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.02 and with cosine annealing Loshchilov & Hutter (2017). For the regularization term, in the rot MNIST and MNIST experiments λreg is set to 0.006. ... We train for 200 epochs by jointly optimizing the parameters of Qθ and the parameters of the network, then we train for another 100 epochs by optimizing only the parameters of the network. CIFAR 10/100: We use a Wide Res Net 28-10 (Zagoruyko & Komodakis, 2016). We optimize the parameters of the network with an SGD optimizer with learning rate of 0.1, cosine annealing, weight decay of 0.0001 and Nesterov momentum (Nesterov, 1983) with value 0.9. For the CIFAR 10 experiment, λreg is set to 0.01, for CIFAR 100 λreg is set to 0.02. ... We train for 200 epochs by jointly optimizing both the parameters Qθ and the network parameters.