reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Understanding and Robustifying Sub-domain Alignment for Domain Adaptation

Authors: Yiling Liu, Juncheng Dong, Ziyang Jiang, Ahmed Aloui, Keyu Li, Michael Hunter Klein, Vahid Tarokh, David Carlson

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical experiments across various benchmarks validate our theoretical insights, prove the necessity for the proposed adaptation strategy, and demonstrate the algorithm s competitiveness in handling label shift. Empirical experiments across various benchmarks validate our theoretical insights, prove the necessity for the proposed adaptation strategy, and demonstrate the algorithm s competitiveness in handling label shift. In this section, we verify our theoretical results and assess DARSA s eﬃcacy through real-world experiments. We begin by empirically conﬁrming the superiority of the sub-domain-based generalization bound (Theorem 4.10) in Section 6.1. Then, we verify that the assumptions for Theorem 4.10 are empirically satisﬁed on real-world datasets (details in Appendix B). Next, we demonstrate the vital role of subdomain weight re-balancing in Section 6.2 and show DARSA s robustness to minor weight estimation discrepancies. Lastly, given that our theoretical analysis guarantees that DARSA should have competitive performance in scenarios where the number of classes is not overwhelming, we evaluate DARSA on real-world datasets with this property. Comparing with other state-of-the-art UDA baselines, we validate our theoretical analysis and demonstrate DARSA s eﬀectiveness in real-world applications, including those in medical settings.
Researcher Affiliation	Collaboration	Yiling Liuú EMAIL Program in Computational Biology and Bioinformatics, Duke University Juncheng Dongú EMAIL Department of Electrical and Computer Engineering, Duke University Ziyang Jiang EMAIL Meta Platforms, Inc. Ahmed Aloui EMAIL Department of Electrical and Computer Engineering, Duke University Keyu Li EMAIL Department of Electrical and Computer Engineering, Duke University Michael Hunter Klein EMAIL Department of Electrical and Computer Engineering, Duke University Vahid Tarokh EMAIL Department of Electrical and Computer Engineering, Duke University David Carlson EMAIL Department of Civil and Environmental Engineering, Duke University Department of Biostatistics and Bioinformatics, Duke University
Pseudocode	Yes	The pseudo-code for DARSA can be found in Appendix D.
Open Source Code	Yes	The code to replicate all experiments is available at: https://github.com/yilingmialiu/DARSA_repo
Open Datasets	Yes	Experiments on the Digits Datasets. In our Digits datasets experiments, we evaluate our performance across four datasets: MNIST (M) (Le Cun et al., 1998), MNIST-M (MM) (Ganin et al., 2016), USPS (U), and SVHN (S), all modiﬁed to induce label distribution shifts. Experiments on the TST Dataset. We use the Tail Suspension Test (TST) dataset (Gallagher et al., 2017) of local ﬁeld potentials (LFPs) from 26 mice with two genetic backgrounds: Clock19 (a bipolar disorder model) and wildtype. This dataset is publicly available (Carlson et al., 2023). Our study involves two domain adaptation tasks, predicting the current condition home cage (HC), open ﬁeld (OF), or tailsuspension (TS) from one genotype to the other. Experiments on the Vis DA-2017 Dataset. We further evaluate DARSA on the large-scale Vis DA-2017 dataset (Peng et al., 2017), a challenging synthetic-to-real benchmark with 12 categories. The MNIST, BSDS500, USPS, SVHN, and Vis DA-2017 datasets are publicly available with an open-access license. The Tail Suspension Test (TST) dataset (Gallagher et al., 2017) is available to download at https://research.repository.duke.edu/concern/datasets/zc77sr31x?locale=en for free under a Creative Commons BYNC Attribution-Non Commercial 4.0 International license.
Dataset Splits	No	The paper describes how datasets were modified to induce label distribution shifts and mentions that 'For comprehensive details, refer to Appendix F, G, and H'. However, the main text does not provide specific percentages, absolute sample counts, or explicit train/test/validation split ratios for any of the datasets used, nor does it cite predefined standard splits with sufficient detail in the provided main content.
Hardware Specification	Yes	The experiments are conducted on a computer cluster equipped with a NVIDIA Ge Force RTX 2080 Ti that has a memory capacity of 11019Mi B.
Software Dependencies	No	The paper mentions that the code to replicate all experiments is available at a GitHub repository, but it does not explicitly list any specific software dependencies with version numbers (e.g., Python, PyTorch versions) in the main text.
Experiment Setup	Yes	Table 4: Ablation study results. The table presents diﬀerent conﬁgurations and their corresponding prediction accuracy (%) across four experimental setups. Experiment Y D a c Accuracy 0.4 0.35 0.9 1 96.0 ... The objective function of DARSA is deﬁned as follows: E Y LY + DLD + LC, where LY , LD, LC are losses described below with relative weights given by Y and D.