Unsupervised Domain Adaptation by Learning Using Privileged Information

Authors: Adam Breitholtz, Anton Matsson, Fredrik D. Johansson

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the empirical benefits of learning using privileged information, compared to the other data availability settings in Table 1, across four UDA image classification tasks where PI is available in the forms described in Section 3. Widely used datasets for UDA evaluation like Office Home (Venkateswara et al., 2017) and large-scale benchmark suites like Domain Bed (Gulrajani & Lopez-Paz, 2021), Vis DA (Peng et al., 2017) and WILDS (Koh et al., 2021) do not include privileged information and cannot be used for evaluation here. Thus, we first compare our method to baselines on the recent Celeb A task (Xie et al., 2020) which includes PI in the form of binary attributes (Section 4.1). Additionally, we propose three new tasks based on well-known image classification data sets with regions of interest as PI (Section 4.2 4.4). In Section 4.1 and 4.2, we use the two-stage estimator with the subnetwork ˆf based on the Res Net-18 architecture (He et al., 2016a). In Section 4.3 and 4.4, we use our variant of Faster R-CNN with a Res Net-50 backbone.
Researcher Affiliation Academia Adam Breitholtz EMAIL Department of Computer Science Chalmers University of Technology Anton Matsson EMAIL Department of Computer Science Chalmers University of Technology Fredrik D. Johansson EMAIL Department of Computer Science Chalmers University of Technology
Pseudocode Yes Algorithm 1 Training of the two-stage model. 1: procedure Two_stage ( xi, wi, ti, yi) 2: Empirically minimize 1 m Pm i=1 d( xi) ti 2 and obtain ˆd. 3: Empirically minimize 1 n Pn i=1 CCE(g(wi), yi) and obtain ˆg. 4: Compose ˆd, ˆg and ϕ into ˆh(x) = ˆg(ϕ(x, ˆd(x))). 5: end procedure
Open Source Code Yes All models were trained using NVIDIA Tesla A40 GPUs and the development and evaluation of this study required approximately 30,000 hours of GPU training. The code is available on Git Hub: https://github. com/Healthy-AI/dalupi.
Open Datasets Yes We use the Chest Xray8 dataset (Wang et al., 2017) as source domain and the Che Xpert dataset (Irvin et al., 2019) as target domain.1 As PI, we use the regions of pixels associated with each found pathology, as annotated by domain experts using bounding boxes. For the Che Xpert dataset, only pixel-level segmentations are available, and we create bounding boxes that tightly enclose the segmentations.
Dataset Splits Yes We use a subset of the Celeb A dataset with 2,000 labeled source examples and 3,000 unlabeled target examples. We use 1,000 samples each for the source validation set, source test set, and target test set, respectively. The target oracle, SL-T, is trained using labels provided for the 3,000 target examples, with 20 % of these examples set aside for validation. The same unlabeled validation set is used to validate the first DALUPI network, ˆf.
Hardware Specification Yes All models were trained using NVIDIA Tesla A40 GPUs and the development and evaluation of this study required approximately 30,000 hours of GPU training.
Software Dependencies No The paper mentions software like Python, PyTorch, skorch, torchvision, TensorFlow, and ADAPT, but does not provide specific version numbers for these components. For example, it says "skorch (Tietz et al., 2017)" but not "skorch X.Y.Z".
Experiment Setup Yes For each task and task-specific setting (label skew, amount of privileged information, etc.), we train 10 models from each relevant class using hyperparameters randomly selected from given ranges (see Appendix A). For DANN and MDD, the trade-off parameter, which regularizes domain discrepancy in representation space, increases from 0 to 0.1 during training; for MDD, the margin parameter is set to 3. All models are evaluated on a held-out validation set from the source domain and the best-performing model in each class is then evaluated on held-out test sets from both domains. For SL-T, we use a held-out validation set from the target domain. We repeat this procedure over 5 or 10 seeds, controlling the data splits and the random number generation. We report accuracy and area under the ROC curve (AUC) with 95 % confidence intervals computed by bootstrapping over the seeds.