Empirical Study on Optimizer Selection for Out-of-Distribution Generalization
Authors: Hiroki Naganuma, Kartik Ahuja, Shiro Takagi, Tetsuya Motokawa, Rio Yokota, Kohta Ishikawa, Ikuro Sato, Ioannis Mitliagkas
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this study, we examine the performance of popular first-order optimizers for different classes of distributional shift under empirical risk minimization and invariant risk minimization. We address this question for image and text classification using Domain Bed, WILDS, and Backgrounds Challenge as testbeds for studying different types of shifts namely correlation and diversity shift. We search over a wide range of hyperparameters and examine classification accuracy (in-distribution and out-of-distribution) for over 20,000 models. |
| Researcher Affiliation | Collaboration | 1Mila Quebec AI Institute, 2Université de Montréal, 3Independent Researcher,4University of Tsukuba, Rio Yokota5, Kohta Ishikawa6, Ikuro Sato5,6, Ioannis Mitliagkas1,2,7 EMAIL, EMAIL, EMAIL, EMAIL 5Tokyo Institute of Technology, 6Denso IT Laboratory Inc., 7Canada CIFAR AI Chair |
| Pseudocode | Yes | Algorithm 1 Generic adaptive optimization method setup. |
| Open Source Code | Yes | Our code can be found at the link below. https://github.com/Hiroki11x/Optimizer_Comparison_OOD |
| Open Datasets | Yes | We evaluate the OOD generalization performance of these optimizers on 10 different benchmarks: Domain Bed (which includes seven image datasets) (Gulrajani & Lopez-Paz, 2021), the Backgrounds Challenge dataset (Xiao et al., 2021), , and Civil Comments-WILDS (Koh et al., 2021). Image Classification Datasets: Domain Bed consists of a set of benchmark datasets for domain generalization, which includes PACS (Fang et al., 2013), VLCS (Li et al., 2017), Office-Home (Venkateswara et al., 2017), Terra Incognita (Beery et al., 2018) Domain Net (Peng et al., 2019), Rotated MNIST (Ghifary et al., 2015), and Colored MNIST (Arjovsky et al., 2019). The Backgrounds Challenge dataset measures a model s robustness against background shift (Xiao et al., 2021). To further strengthen our claim, we also performed experiments on CIFAR10-C and CIFAR10-P which can be casted to image corruption and perturbation shift. Natural Language Processing (NLP) Datasets: The Civil Comments-WILDS dataset is cast as a subpopulation shift problem. |
| Dataset Splits | Yes | Our first approach involves partitioning the data from the training domains into a training dataset and a validation dataset, subsequently selecting the model that demonstrates the highest average performance (accuracy) based on the validation data from the training domain. In the training phase of Domain Bed datasets, we do not access the data in the test domain but split data from the training domains into a training dataset and validation dataset. The split ratio is 80 % for training and 20 % for validation. In Civil Comments-WILDS, we divide the data into training, validation, and test datasets and maximize worst-group accuracy in the validation data (and by association, maximize the average accuracy over all domains). |
| Hardware Specification | No | We perform our experiment with ABCI (AI Bridging Cloud Infrastructure), a supercomputer owned by the National Institute of Advanced Industrial Science and Technology, and TSUBAME, a supercomputer owned by the Tokyo Institute of Technology. The computational resources instrumental to this study were provided under the auspices of the "ABCI Grand Challenge" Program, National Institute of Advanced Industrial Science and Technology (AIST), and the TSUBAME Grand Challenge Program, Tokyo Institute of Technology. Special thanks to the AI Bridging Cloud Infrastructure (ABCI) and the TSUBAME 3.0. Moreover, we acknowledge the generous allocation of computational resources from the TSUBAME3.0 supercomputer, facilitated by Tokyo Institute of Technology. |
| Software Dependencies | No | All codes for experiments are modifications of the codes provided by the authors who introduced the datasets Gulrajani & Lopez-Paz (2021); Koh et al. (2021); Xiao et al. (2021). Licenses of the codes are MIT license for Domain Bed Gulrajani & Lopez-Paz (2021) and WILDS Koh et al. (2021). The code of Backgrounds Challenge does not indicate the license. Our code can be found at the link below. https://github.com/Hiroki11x/Optimizer_Comparison_OOD |
| Experiment Setup | Yes | We describe the configurations of hyperparameters and protocol for the experiments in further detail in Appendix E and Appendix D respectively. Hyperparameter Tuning: The hyperparameters are tuned using Bayes optimization functionality of Weights&Biases2 by evaluating in-distribution validation accuracy. Table 5: Domain Bed: Workloads Table 6: Domain Bed: Res Net-50 Table 7: Domain Bed: MNIST Conv Net (Gulrajani & Lopez-Paz, 2021) |