reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Causality Inspired Federated Learning for OOD Generalization

Authors: Jiayuan Zhang, Xuefeng Liu, Jianwei Niu, Shaojie Tang, Haotian Yang, Xinghao Wu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the effectiveness of our method, we conduct experiments on both the spurious correlation dataset, i.e. Waterbirds (Sagawa et al., 2019), Colored MNIST/FMNIST (Arjovsky et al., 2019; Ahuja et al., 2020) and the cross-domain datasets, i.e., Digits, PACS (Li et al., 2017).
Researcher Affiliation	Academia	1State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University 2Zhongguancun Laboratory, Beijing, China 3Department of Management Science and Systems, School of Management, Center for AI Business Innovation, University at Buffalo, Buffalo, NY, USA. Correspondence to: Jianwei Niu <EMAIL>, Xuefeng Liu <liu EMAIL>.
Pseudocode	Yes	Algorithm 1 Fed Uni
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	To evaluate the effectiveness of our method, we conduct experiments on both the spurious correlation dataset, i.e. Waterbirds (Sagawa et al., 2019), Colored MNIST/FMNIST (Arjovsky et al., 2019; Ahuja et al., 2020) and the cross-domain datasets, i.e., Digits, PACS (Li et al., 2017).
Dataset Splits	Yes	The test results are based on the model that performs best on the validation set sampled from the training data with a sampling ratio of 0.2. For Digits and PACS datasets, we follow the leave-one-out rule, where we choose one domain as target domain, train the model on all remaining domains, and evaluate on the target domain. We conduct experiments on 10 clients, the data on each client are sampled from the same domain, and the domains on different clients can be repeated. For example, in the Digits dataset, there are 5 domains in total, with training clients covering 4 of these domains. The number of clients in each training domain is 2, 2, 2, and 4, respectively, and each domain contains 1000 training samples. In the PACS dataset, there are 4 domains in total, with training clients covering 3 of these domains. The number of clients in each training domain is 3, 3, and 4, respectively, and each domain contains 750 training samples.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions models like ResNet-18, MLP, and AlexNet but does not specify software dependencies with version numbers (e.g., PyTorch version, Python version, CUDA version).
Experiment Setup	Yes	Unless otherwise mentioned, the local update step is 5 and the mini-batch size is 64. The learning rate lr = 1.41 10 4. In our setting, α, β, λ are set to 0.1, 1, 0.01 respectively.