reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FedPHA: Federated Prompt Learning for Heterogeneous Client Adaptation

Authors: Chengying Fang, Wenke Huang, Guancheng Wan, Yihao Yang, Mang Ye

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results validate the effectiveness of Fed PHA in achieving a balance between global and personalized knowledge in federated learning scenarios. In this section, we conduct extensive experiments aiming at answer following research questions: Q1: Does the proposed method maintain its effectiveness when the prompt length is fixed? How does it compare to the state-of-the-art (SOTA) methods? (in Sec 4.2) Q2: For clients with diverse data distributions, can prompts of varying lengths enhance performance? Does length heterogeneity provide any advantage? (in Sec 4.3)
Researcher Affiliation	Academia	1National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China. Correspondence to: Mang Ye <EMAIL>.
Pseudocode	Yes	Algorithm 1 Overall Procedure of Fed PHA
Open Source Code	Yes	The source code is available at: https://github.com/CYFang6/ Fed PHA.
Open Datasets	Yes	We use five visual classification datasets Food101 (Bossard et al., 2014), DTD (Cimpoi et al., 2014), Caltech101 (Fei-Fei, 2004), Flowers102 (Nilsback & Zisserman, 2008), and Oxford Pets (Parkhi et al., 2012) collectively referred to as the CLIP dataset (1 domain). In addition, we select two cross-domain datasets, Office31 (Saenko et al., 2010) (3 domains) and Office Home (Venkateswara et al., 2017) (4 domains)... Finally, we employ two classic image benchmark datasets, CIFAR10 (Krizhevsky et al., 2010) and CIFAR-100 (Krizhevsky & Hinton, 2009)...
Dataset Splits	Yes	We evaluate the models on each client’s private test data, which follows the same distribution as its training set. These datasets are configured using a pathological non-IID setting, where each client is randomly allocated a distinct number of non-overlapping classes to simulate heterogeneous data distributions. Multi-domain datasets (Office31, Office Home) set N to twice the number of domains, assigning each domain s data to two clients. CIFAR-10 and CIFAR-100 use N = 100, with each client randomly assigned 10% of the dataset. For CIFAR-10 and CIFAR-100, where data is randomly partitioned among clients using a symmetric Dirichlet distribution as in (Cao et al., 2023; Shamsian et al., 2021) with β = 0.5. Table 6 provides a detailed overview, including the original task, number of classes, training and test sample sizes, and domain counts.
Hardware Specification	Yes	We conducted all experiments with Py Torch (Paszke et al., 2019) on NVIDIA RTX 3090 GPUs.
Software Dependencies	Yes	We conducted all experiments with Py Torch (Paszke et al., 2019) on NVIDIA RTX 3090 GPUs.
Experiment Setup	Yes	Local training rounds are set to E = 1 and federated communication rounds to R = 50, except for CIFAR-10 and CIFAR-100, where R = 25. Final performance is averaged over the last 10 communication rounds. For learnable prompts, the default length is 16 with a 512-dimensional representation. In heterogeneous settings, local prompt lengths range from 4 to 32, while the global prompt length remains 16. The batch size is 32 for training and 128 for testing. For hyperparameter settings, the ratio (ρ in Eq.(9)) defaults to 0.8, and alpha (α in Eq.(12)) to 1. The optimizer used is Stochastic Gradient Descent (SGD) (Robbins & Monro, 1951) with a learning rate of η = 0.001. All input images are resized to 224 224 pixels and further divided into 14 14 patches with a dimension of 768.