reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Federated Unsupervised Domain Generalization Using Global and Local Alignment of Gradients

Authors: Farhad Pourpanah, Mahdiyar Molahasani, Milad Soltany, Michael Greenspan, Ali Etemad

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To empirically evaluate our method, we perform various experiments on four commonly used multi-domain datasets, PACS, Office Home, Domain Net, and Terra Inc. The results demonstrate the effectiveness of our method which outperforms comparable baselines. Ablation and sensitivity studies demonstrate the impact of different components and parameters in our approach.
Researcher Affiliation	Academia	Queen s University, Canada EMAIL
Pseudocode	Yes	Algorithm 1: Fed Ga LA
Open Source Code	Yes	Code https://github.com/Mahdiyar MM/Fed Ga LA
Open Datasets	Yes	We verify the performance of our approach using four public datasets, PACS (Yu et al. 2022), Office Home (Venkateswara et al. 2017), Domain Net (Peng et al. 2019), and Terra Inc (Beery, Van Horn, and Perona 2018) and demonstrate strong performance in federated unsupervised domain generalization in comparison to various baselines.
Dataset Splits	Yes	We use the leave-one-domain-out setting used in prior works (Zhang et al. 2023a, 2022; Gulrajani and Lopez Paz 2021). This involves selecting one domain as the target, training the model on the rest of the domains, and then testing the model s performance on the selected target domain. Linear evaluation, a common feature evaluation approach, is utilized to evaluate the quality of learned representations (Feng, Xu, and Tao 2019; Zhang, Isola, and Efros 2017; Kolesnikov, Zhai, and Beyer 2019). For linear evaluation, following (van Berlo, Saeed, and Ozcelebi 2020; Zhuang, Wen, and Zhang 2022), we utilize 10% and 30% of the target data to train the linear classifier and evaluate the remaining 90% and 70% of the data, respectively.
Hardware Specification	Yes	All experiments were implemented using Py Torch and trained on 8 NVIDIA Ge Force RTX 3090 GPUs.
Software Dependencies	No	The paper mentions "Py Torch" but does not specify a version number for it or any other software, making the dependency description not reproducible.
Experiment Setup	Yes	We use Sim CLR as the SSL module in Fed Ga LA due to its performance on domain generalization problems as previously shown (Zhang et al. 2022). Following(Zhang et al. 2022), Res Net-18 (He et al. 2016) is employed as the encoder network architecture for all experiments, which we train from scratch. We present details regarding data augmentations, projector architecture, and en- coder hyperparameters in Appendix B. Following (Feng, Xu, and Tao 2019; Zhang, Isola, and Efros 2017), we first learn a representation by Fed Ga LA and the baseline models for 100 communication rounds with 7 local epochs. Next, we freeze the backbone model and train a liner classier for 100 epochs to perform prediction on the target domain. Appendix B presents all the hyperparameters used for Fed Ga LA. For Fed EMA, we use the hyperparameters reported in (Zhuang, Wen, and Zhang 2022). All experiments were implemented using Py Torch and trained on 8 NVIDIA Ge Force RTX 3090 GPUs. For each experiment, we train the models three times with random initialization seeds and report the average.