Clustering-Based Validation Splits for Model Selection under Domain Shift

Authors: Andrea Napoli, Paul White

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, the technique consistently outperforms alternative splitting strategies across a range of datasets and training algorithms, for both domain generalisation and unsupervised domain adaptation tasks. Analysis also shows the MMD between the training and validation sets to be well-correlated with test domain accuracy, further substantiating the validity of this approach. ... 5 Experiments
Researcher Affiliation Academia Andrea Napoli & Paul White EMAIL Institute of Sound and Vibration Research University of Southampton, UK
Pseudocode Yes Algorithm 1 Constrained kernel k-means clustering
Open Source Code No The paper does not explicitly provide a link to the source code for the described methodology, nor does it contain a clear statement about its release or availability in supplementary materials. The Open Review link provided is for a review forum, not a code repository.
Open Datasets Yes Camelyon17-WILDS (Bándi et al., 2019; Koh et al., 2021) tumour detection in tissue samples across 5 hospitals... License: CC0. ... SVIRO (Dias Da Cruz et al., 2020) classification of vehicle rear seat occupancy... License: CC BY-NC-SA 4.0. ... Terra Incognita (Beery et al., 2018) classification of wild animals... License: CDLA-Permissive 1.0.
Dataset Splits Yes S must be partitioned into training and validation sets, T and V respectively. ... T and V should be of sizes determined by a user-defined holdout fraction h satisfying 0 < h < 1... Table 6: Holdout fraction 0.2 UDA holdout fraction 0.5. ... Every domain is tested 3 times for reproducibility, each time with a different random seed for model initialisation, hyperparameter search and other stochastic variables. The reported accuracy values are averages over all domains and repeats.
Hardware Specification No The authors acknowledge the use of the IRIDIS High Performance Computing Facility, and associated support services at the University of Southampton, in the completion of this work. ... In total, the experiments involve training 5,160 models, requiring around 100 GPU-days of computation. These statements indicate the use of computing resources but lack specific hardware details such as GPU/CPU models.
Software Dependencies Yes Experiments are conducted using the Domain Bed framework (Gulrajani & Lopez Paz, 2021). This means all-but-one of the domains are placed in the development set... The Gurobi Optimizer (Gurobi Optimization LLC, 2023) is used to solve the LPs.
Experiment Setup Yes Table 6: General parameter values and training details for the experiments. Experimental parameter Value Hyperparameter random search size 10 Number of trials 3 Holdout fraction 0.2 UDA holdout fraction 0.5 Number of training steps 3000 Gaussian kernel bandwidth 1 Finetuning iterations before split 3000 Nyström subset size (if applicable) 2000 Architecture Res Net-18 Class balanced True