reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Federated Deconfounding and Debiasing Learning for Out-of-Distribution Generalization

Authors: Zhuang Qi, Sijin Zhou, Lei Meng, Han Hu, Han Yu, Xiangxu Meng

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on 2 benchmarking datasets demonstrate that Fed DDL significantly enhances the model capability to focus on main objects in unseen data, leading to 4.5% higher Top-1 Accuracy on average over 9 state-of-the-art existing methods. Extensive experiments were conducted on two datasets, including performance comparisons, ablation studies and case studies with visual attention visualizations to investigate the association between background and labels.
Researcher Affiliation	Academia	1School of Software, Shandong University, China 2AIM Lab, Faculty of Engineering, Monash University, Clayton, VIC, Australia 3School of information and Electronics, Beijing Institute of Technology, China 4College of Computing and Data Science, Nanyang Technological University, Singapore
Pseudocode	Yes	Algorithm 1 FEDDDL 1: Initialize the global model parameter θ0 2: for t = 1, . . . , T do 3: Sample subset K of clients with \|K\| = k 4: for each client k K in parallel do 5: Initialize local model parameter θt k = θt 1 6: Counterfactual sample generation DC,k based on objects IO,k and backgrounds IB,k images 7: for e = 1, . . . , E do 8: Sample batches of data ζ1, ζ2 from local data Dk and counterfactual data DC,k 9: if t = 1 then 10: gk = LJ(ζ1, ζ2) 11: else 12: gk = LJ(ζ1, ζ2) + λ LCR(ζ1, ζ2, UG) 13: end if 14: Update θt k θt k ηlgk 15: end for 16: U i,k L,t 1 \|Ii,k O \| PMG(θt 1, Ii,k O ), i = 1, . . . , N 17: end for 18: θt = 1/k P k K θt 1 k 19: U i G,t = 1/k P k K U i,k L,t, i = 1, . . . , N 20: end for
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. It does not contain a specific repository link, an explicit code release statement, or mention code in supplementary materials.
Open Datasets	Yes	Following previous work on OOD generalization [Qi and et al., 2024; Wang et al., 2022], experiments are conducted on two datasets: NICO-Animal and NICOVehicle [Wang et al., 2021]. Their statistics and partitioning method of the datasets can be found in Table 1.
Dataset Splits	Yes	Datasets #Class #Training #Testing NICO-Animal (F7) 10 10,633 2,443 NICO-Animal (L7) 10 8,311 4,765 NICO-Vehicle (F7) 10 8,027 3,626 NICO-Vehicle (L7) 10 8,352 3,301 F7 represents data from the first seven backgrounds of each class used as the training set. L7 represents data from the last seven backgrounds of each class used as the training set.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions ResNet-18 as the backbone and Grounding DINO as a pre-trained model, but does not provide specific software dependencies (e.g., Python libraries like PyTorch or TensorFlow) with their version numbers.
Experiment Setup	Yes	We set the local training epochs to 10 per global round for both datasets. The total number of communication rounds is 50, with 7 clients for both datasets. We used a client sampling fraction of 1.0 and employed SGD as the optimizer. During local training, the weight decay is set to 0.01, the batch size is 64, and the initial learning rate is 0.01 for both datasets. λ is chosen from the set {0.1, 1.0, 2.0}. τ is selected from {0.5, 0.07}. η is tuned from {1, 3, 5}.