Gradual Domain Adaptation: Theory and Algorithms

Authors: Yifei He, Haoxiang Wang, Bo Li, Han Zhao

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate that our GOAT framework can improve the performance of standard GDA when the given intermediate domains are scarce, significantly broadening the real-world application scenarios of GDA. Our code is available at https://github.com/uiuctml/GOAT. ... 5. Experiments Our goal of the experiment is to demonstrate the performance gain of training on generated intermediate domains in addition to given domains. We compare our method with gradual self-training (Kumar et al., 2020), which only self-trains a model along the sequence of given domains iteratively. In Sec. 5.4, we further analyze the choices of encoder E and transport plan γ used by Algorithm 1. More details of our experiments are provided in Appendix D.
Researcher Affiliation Academia Yifei He EMAIL University of Illinois Urbana-Champaign Haoxiang Wang EMAIL University of Illinois Urbana-Champaign Bo Li EMAIL University of Chicago Han Zhao EMAIL University of Illinois Urbana-Champaign
Pseudocode Yes Algorithm 1 Generative Gradual Domain Adaptation with Optimal Transport (GOAT) Require: SX 0 = {x0i}m i=1, SX T = {x Ti}n i=1; Encoder E; Source-trained classifier h0 Encode: SZ 0 ={z0i=E(x0i)}m i=1, SZ T ={z Tj=E(x Tj)}n j=1 Optimal Transport (OT): Solve for the OT plan γ Rm n 0 between SZ 0 and SZ T Cutoff: Use a cutoffthreshold to keep O(n+m) elements of γ above the threshold and zero out the rest //Only applies to the entropy-regularized version of OT Intermediate Domain Generation: for t = 1, . . . , T do Initialize an empty set SZ t for each non-zero element γ ij of γ do z T t T z Tj Add (z, γ ij) to St end for end for Gradual Domain Adaptation: for t = 1, . . . , T do ht=ST(ht 1, St) //Can also apply sample weights to losses based on γ ij end for output Target-adapted classifier h T
Open Source Code Yes Our code is available at https://github.com/uiuctml/GOAT.
Open Datasets Yes Empirically, we conduct experiments on Rotated MNIST, Color-Shift MNIST, Portraits (Ginosar et al., 2015) and Cover Type (Blackard and Dean, 1999), four benchmark datasets commonly used in the literature of GDA. ... Rotated MNIST A semi-synthetic dataset built on the MNIST dataset (Le Cun and Cortes, 1998)...
Dataset Splits Yes Rotated MNIST A semi-synthetic dataset built on the MNIST dataset (Le Cun and Cortes, 1998), with 50K images as the source domain and the same 50K images rotated by 45 degrees as the target domain. Intermediate domains are evenly distributed between the source and target. ... Portraits (Ginosar et al., 2015) ...the dataset is sorted chronologically and split into a source domain (first 2000 images), 7 intermediate domains (next 14000 images), and a target domain (last 2000 images). ... Cover Type (Blackard and Dean, 1999) ...splitting the data into a source domain (first 50K data), 10 intermediate domains (each with 40K data) and a target domain (final 50K data).
Hardware Specification Yes Our code is built in Py Torch (Paszke et al., 2019), and our experiments are run on NVIDIA RTX A6000 GPUs.
Software Dependencies Yes Our code is built in Py Torch (Paszke et al., 2019)... To calculate the optimal transport plan between the source and target, we use the Earth Mover Distance solver from (Flamary et al., 2021).
Experiment Setup Yes For Rotated MNIST, Color-Shift MNIST and Portraits, we use a convolutional neural network (CNN) of 4 convolutional layers of 32 channels followed by 3 fully-connected layers of 1024 hidden neurons, with Re LU activation. For Cover Type, we use a multi-layer perceptron (MLP) of 3 hidden layers with 256 hidden neurons. We also adopt common practices of Adam optimizer (Kingma and Ba, 2015), Dropout (Srivastava et al., 2014), and Batch Norm (Ioffe and Szegedy, 2015). ... The encoder and decoder are jointly trained on data from source and target in an unsupervised manner with the Adam optimizer (Kingma and Ba, 2015) (learning rate as 10 4 and batch size as 512).