BSemiFL: Semi-supervised Federated Learning via a Bayesian Approach
Authors: Haozhao Wang, Shengyu Wang, Jiaming Li, Hao Ren, Xingshuo Han, Wenchao Xu, Shangwei Guo, Tianwei Zhang, Ruixuan Li
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper first theoretically and empirically demonstrate that the local model achieves higher re-labeling accuracy over local data while the global model can progressively improve the re-labeling performance by introducing the extra knowledge of other clients. Based on these, we propose BSemi FL which re-labels the local data via the collaboration between the local and global model in a Bayesian approach. ... Experiments show that BSemi FL improves the performance by up to 9.8% as compared to existing methods. |
| Researcher Affiliation | Academia | 1School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China 2Division of Integrative Systems and Design, Hong Kong University of Science and Technology, Hongkong, China 3School of Cyber Science and Engineering, Sichuan University, Chengdu, China 4College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China 5College of Computer Science, Chongqing University, Chongqing, China 6College of Computing and Data Science, Nanyang Technological University, Singapore, Singapore. Correspondence to: Ruixuan Li <EMAIL>. |
| Pseudocode | Yes | B. Algorithm Workflow The workflow is presented in Algorithm 1. In lines 4-5, the server first trains the global model on the labeled dataset and then calculates the sum of sampling probability. Next, in lines 6-8, the server distributes the global model and probability sum to all selected clients. Each client mainly conducts two main steps as follows. Labeling local data (lines 14-21): 1. Each client m calculates the closeness of the global model in lines 16-17; 2. Each client m calculates the closeness of the local model in lines 18-19; 3. Each client m applies an ensemble of the global and local model to construct the labeled dataset in lines 20-22. Local training (lines 24-25): 1. Each client m performs local training with labeled dataset for E local epochs; 2. Each client m pushes the local model to the server. Finally, in lines 9-11, the server aggregates the local models of all participated clients into the global model. Algorithm 1 Algorithm workflow of BSemi FL Input : T: round; M: client number; η: learning rate; 1 Initialize the parameter w0; In server: for t = 1 to T do 2 Fine-tune the global model wt on the dataset S; Calculate the sum of sampling probability of the server dataset S for all K classes: Qk s = PNs i=1 fˆp(k|xi, wt); Randomly select Mt clients; for each selected client m in parallel do 3 Send the global model wt and probability vector Qs; Receive the local model wm t ; 5 Aggregate local models: wt+1 = PMt m=1 Sm 7 In client m: for each sample xm i Um do 8 Calculate the output of the global model fˆp(xm i , wt) and of the local model fˆpm(xm i , w tm); Calculate the conditional probability of the global model: ˆp(xm i |k) = fˆ p(k|xm i ,wt) fˆ p(k|xm i ,wt)+Qks for all classes k = 1, . . . , K; Calculate the closeness of the global model: ˆp(xm i ) = PK k=1 ˆp(xm i |k)ˆps(k); Calculate the conditional probability of the local model: ˆpm(xm i |k) = fˆ pm(k|xm i ,wm t ) PSm i=1 fˆ pm(k|xi,wm t ) for all K classes; Calculate the closeness of the local model: ˆpm(xm i ) = PK k=1 ˆpm(xm i |k)ˆpm(k); Normalizes the weights: αm i = ˆp(xm i ) ˆp(xm i )+ˆpm(xm i ), 1 αm i = ˆpm(xm i ) ˆp(xm i )+ˆpm(xm i ); Apply an ensemble of the global and local model: ˆym i = αm i fˆp(xm i , wt) + (1 αm i )fˆpm(xm i , wm t ); Constructs the dataset Sm by conducting (2); 10 Update local model wm t for E local epochs on the labeled dataset Sm: wm t = wm t η wm t Lm; Send the model wm t to the server; |
| Open Source Code | No | The paper does not contain any explicit statements about code release or links to a code repository. |
| Open Datasets | Yes | Datasets and Models. We consider three popular datasets in experiments, i.e., SVHN (Netzer et al., 2011), CIFAR-10 (Krizhevsky et al., 2009) and CIFAR-100 (Krizhevsky et al., 2009). which contains 10, 10, 100 classes respectively. |
| Dataset Splits | Yes | Data Partition. We adopt two Non-IID data partition methods: Shards (Mc Mahan et al., 2017) and Dirichlet (Lin et al., 2020). In the Shards setting, the sorted samples are shuffled into M S shards, and assigned to M clients randomly. Dirichlet distribution uses α to characterize the degree of heterogeneity. We set α of Dirichlet: {0.1, 1, 10} and shards for each client: {2, 4, 8}. |
| Hardware Specification | Yes | Implementation. We implement the whole experiment in a simulation environment based on Py Torch 2.0 and 4 NVIDIA Ge Force RTX 3090 GPUs. |
| Software Dependencies | Yes | Implementation. We implement the whole experiment in a simulation environment based on Py Torch 2.0 and 4 NVIDIA Ge Force RTX 3090 GPUs. |
| Experiment Setup | Yes | Implementation. We use 100 clients in total and randomly choose 10% each round for local training. We set the local epoch to 5, batch size to 10, and learning rate to 3.0e 2. We employ SGD optimizer with the momentum of 0.9 and weight decay of 5e 4 for all methods and datasets. The number of global communication rounds is 800. Each experiment is run 3 times and we take each run s final 10 rounds accuracy to calculate the average value and standard variance. We set the threshold of our method to be 0.7. For Semi FL and Fed Match, we adopt the same thresholds as leveraged in their original paper, i.e., 0.95 for both Semi Fl and for Fed Match. |