reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Gated Domain Units for Multi-source Domain Generalization

Authors: Simon Föll, Alina Dubatovka, Eugen Ernst, Siu Lun Chau, Martin Maritsch, Patrik Okanovic, Gudrun Thaeter, Joachim M. Buhmann, Felix Wortmann, Krikamol Muandet

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on image, text, and graph data show consistent performance improvement on out-of-training target domains. These findings support the practicality of the I.E.D assumption and the effectiveness of GDUs for domain generalisation. We verify the I.E.D assumption by extensive experiments using a publicly available benchmarking WILDS. Specifically, we validate our method on image, text, and graph datasets, showing consistent improvement on out-of-training target domains. Our experimental evaluations are then presented in Section 5.
Researcher Affiliation	Academia	1Department of Management, Technology, and Economics, ETH Zurich, Switzerland 2Department of Computer Science, ETH Zurich, Switzerland 3Department of Mathematics, Karlsruhe Institute for Technology, Germany 4Institute for Technology Management, University of St. Gallen, Switzerland 5CISPA Helmholtz Center for Information Security, Germany
Pseudocode	No	The paper describes the GDU layer and model training with mathematical equations and formal definitions, such as Definition 1 and Proposition 3.1, but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is publicly available for Tensor Flow (https://github.com/im-ethz/pub-gdu4dg) and Py Torch (https://github.com/im-ethz/gdu4dg-pytorch).
Open Datasets	Yes	We verify the I.E.D assumption by extensive experiments using a publicly available benchmarking WILDS. Specifically, we validate our method on image, text, and graph datasets, showing consistent improvement on out-of-training target domains. ...we create a multi-source dataset by combining five publicly available digits image datasets, namely MNIST Lecun et al. (1998), MNIST-M Ganin & Lempitsky (2015), SVHN Netzer et al. (2011), USPS, and Synthetic Digits (SYN) Ganin & Lempitsky (2015). ...using eight datasets: Camelyon17, FMo W, Amazon, i Wild Cam, and Rx Rx1, OGB-Mol PCBA, Civil-Comments, and Poverty Map.
Dataset Splits	Yes	Each dataset, except USPS, is split into training and test sets of 25,000 and 9,000 images, respectively. ...Camelyon17 comprises images of tissue patches from five different hospitals. While the first three hospitals are the source domains (302,436 examples), the forth and fifth are the validation (34,904 examples) and test domain (85,054 examples), respectively. ...training (76,863 images; between 2002 2013), validation (19,915 images; between 2013 and 2016), and test (22,108 images, between 2016 2017). ...training (245,502 reviews from 1,252 reviewers), validation (100,050 reviews from 1,334 reviewers), test (100,050 reviews from 1,334 reviewers). ...243 training traps (129,809 images), 32 validation traps (14,961 images), and 48 test traps (42,791 images). ...training (40,612 images, 33 domains), validation (9,854 images, 4 domains), and test (34,432 images, 14 domains). ...training (44,930 domains), validation (31,361 domains), and test (43,739 domains). ...training (269,038 comments), validation (45,180 comments), and test (133,782 comments) set. ...The avergae size of each set across the 5 folds is for the training 10,000 images (13-14 countries), 4,000 images (4-5 different countries), and for the test set 4,000 images (13-14 countries).
Hardware Specification	Yes	Although the Gated Domain layer requires more computation resources than the ERM models, all digits experiments were conducted on a single GPU (NVIDIA Ge Force RTX 3090).
Software Dependencies	Yes	Our Digits experiments are implemented using Tensor Flow 2.4.1 and Tensor Flow Probability 0.12.1. For the WILDS benchmarking we use our Py Torch (version 1.11.0).
Experiment Setup	Yes	For training, we resorted to the Adam optimizer with a learning rate of 0.001. We used early stopping and selected the best model weights according to the validation accuracy. For the validation data, we used the combined test splits only of the respective source datasets. The batch size was set to 512. The hyper-parameters relevant for our layer are summarized in Table 7 in Appendix B.1. In Table 2, we present the final results for our proof-of-concept experiment. We compare the performance of the Gated Domain layer with different similarity functions (CS, MMD, Projected) trained in fine tuning (FT) and end-to-end (E2E) modes.