reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FedCFA: Alleviating Simpson’s Paradox in Model Aggregation with Counterfactual Federated Learning

Authors: Zhonghua Jiang, Jimin Xu, Shengyu Zhang, Tao Shen, Jiwei Li, Kun Kuang, Haibin Cai, Fei Wu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on six datasets and verify that our method outperforms other FL methods in terms of efficiency and global model accuracy under limited communication rounds.
Researcher Affiliation	Academia	1Zhejiang University 2East China Normal University EMAIL, EMAIL
Pseudocode	Yes	The main steps of our proposed Fed CFA framework are shown in Algorithm 1, where T is the total communication rounds set for FL, and wt k is the model parameters of client k in the t-th round.
Open Source Code	No	The paper does not contain an explicit statement about the release of source code or a link to a code repository.
Open Datasets	Yes	Datasets: CIFAR10, CIFAR100 (Krizhevsky, Hinton et al. 2009), Tiny-Image Net, FEMNIST (Caldas et al. 2018), Sent140 (Go, Bhayani, and Huang 2009), MNIST. We built a dataset with Simpson s Paradox based on MNIST.
Dataset Splits	Yes	We use two different data partition methods: IID and Non-IID. IID Partition distributes samples uniformly to K clients through random sampling. We use IIDK to represent this data division. For Non-IID, we utilize Dirichlet distribution Dir K(α) to simulate the imbalance of dataset. The smaller the α, the greater the data difference between clients. We try several different client numbers and data partition methods: Dir60(0.6), Dir60(0.2), Dir100(0.2), Dir100(0.6), IID60, IID100. We use Dirichlet distribution to adjust the frequency of different categories labels in each client to simulate label distribution P(Y ) heterogeneity among clients. For FEMNIST, we divide different users into different clients to simulate feature distribution P(X) heterogeneity due to handwriting style variance. For binary classification text dataset Sent140, we divide it into different clients based on users and ensure consistent label distribution among clients, to simulate the heterogeneity of conditional feature distribution P(X\|Y ).
Hardware Specification	Yes	We conduct experiments on a NVIDIA A100 with 40GB memory.
Software Dependencies	No	Using Fed Lab (Zeng et al. 2023), we build a typical FL scenario. No specific version numbers for Fed Lab or other software dependencies (e.g., Python, PyTorch) are provided.
Experiment Setup	Yes	Unless specified otherwise, we use MLP, Res Net18 and LSTM as network model, with 60 clients, learning rate β of 0.01, one local epoch, batch size of 128, and 500 communication rounds.