An Improved Federated Clustering Algorithm with Model-based Clustering

Authors: Harsh Vardhan, Avishek Ghosh, Arya Mazumdar

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the performance of SR-FCA on real-world FL datasets including FEMNIST and Shakespeare in non-convex problems and show the benefits of SR-FCA over several baselines. [...] We implement SR-FCA on wide variety of simulated heterogeneous datasets (rotated or inverted MNIST, CIFAR10) and real federated datasets (FEMNIST and Shakespeare Caldas et al. (2018)). With cross-cluster loss distance metric , we compare the test performance of SR-FCA with five baselines (a) global (one model for all users) and (b) local (one model per user), (c)IFCA, (d) CFL Sattler et al. (2021) and (e) Local-KMeans (local models clustered by KMeans on model weights). On simulated datasets, SR-FCA obtains test accuracy no worse than the best baseline and is able to recover the correct clustering. For CIFAR10, in particular, SR-FCA has 5% better test accuracy than IFCA. On real datasets, SR-FCA outperforms all baselines.
Researcher Affiliation Academia Harsh Vardhan EMAIL Halicioğlu Data Science Institute University of California, San Diego Avishek Ghosh EMAIL Systems and Control Engg., Centre for Machine Intelligence and Data Sciences Indian Institute of Technology, Bombay Arya Mazumdar EMAIL Halicioğlu Data Science Institute University of California, San Diego
Pseudocode Yes Algorithm 1: SR-FCA Input: Threshold λ, Size parameter t Output: Clustering CR C0 ONE_SHOT(λ, t) for r=1 to R do Cr REFINE(Cr 1,λ) end for ONE_SHOT(λ,t) Server: for all i clients in parallel do Send w0 to client i Receive wi,T from client i end for G Graph with m nodes (one per user) and no edges for all pairs of clients i,j [m],i =j do Add edge (i,j) to the graph G if dist(wi,T ,wj,T ) λ end for C0 Connected components from graph G with size t Client(i): Receive w0 from Server. wi,T Perform T training iterations initialized from w0. Send wi,T to Server. REFINE(Cr 1,λ) for all clusters c Cr 1 do ωc,T Trimmed Mean GD() end for C r RECLUSTER( Cr 1) Cr MERGE(C r,λ,t) Algorithm 2: RECLUSTER() Input: Cluster models {ωc,T }c rg(Cr), User models {wi}m i=1, Clustering Cr Output: Improved Clustering C r for all users i [m] do C r(i) argminc rg(Cr)dist(wi,ωc,T ) end for return Clustering C r. Algorithm 3: Trimmed Mean GD() Input: 0 β < 1 2, Clustering Cr Output: Cluster models {ωc,T }c rg(Cr) Server: for all clusters c rg(Cr) in parallel do Sc ={i [m]:Cr(i)=c}. ωc,0 w0 for t=0 to T 1 do Send ωc,t to all clients i Sc. Receive fi(ωc,t) from all clients i Sc. g(ωc,t) Tr Meanβ({ fi(wc,t),i Sc}) ωc,t+1 proj W{ωc,t ηg(ωc,t)} end for end for Return {ωc,T }c rg(Cr) Client(i): Receive ωc,T from Server. Send fi(ωc,T ) to Server. Algorithm 4: MERGE() Input: Cluster models {ωc,T }c rg(Cr) , Clustering C r, Threshold λ, Size parameter t Output: Merged Clustering Cr+1, Cluster models {ωc,T }c rg(Cr+1) G Graph with nodes rg(C r) and no edges for all pairs of clusters c,c rg(C r),c =c do Add edge (c,c ) to the graph G if dist(wc,wc ) λ end for Ctemp Connected components from graph G of size t For each cluster in Ctemp, merge the nodes of its component clusters to get Cr+1 for c rg(Ctemp) do Gc {c rg(C r) which merged into c} ωc,T 1 |Gc| P c Gcωc ,T end for return Cr+1,{ωc,T }c rg(Cr+1).
Open Source Code No The total time to run all experiments including hyperparameter tuning on a single NVIDIA-Ge Force-RTX-3090 is 2.5 weeks and the code is provided here.
Open Datasets Yes We implement SR-FCA on wide variety of simulated heterogeneous datasets (rotated or inverted MNIST, CIFAR10) and real federated datasets (FEMNIST and Shakespeare Caldas et al. (2018)).
Dataset Splits Yes We set m=100,n=600. We obtain Rotated CIFAR10 by creating 2 clusters with the images rotated by 180 degrees. To test with label heterogeneity, we create Label CIFAR10 with 2 clusters. The first cluster contains the first 7 of the 10 labels and the second cluster contains the last 7 of the 10 labels. We set m=32 for both CIFAR10 datasets and n=3125 and n=4375 for Rotated and Label CIFAR10 respectively. To emulate practical FL scenarios, we assume that only a fraction of the nodes participate in the learning procedure. For Rotated and Inverted MNIST, we assume that all the nodes participate, while for Rotated and Label CIFAR10, 50% of the nodes participate. For MNIST, we train a 2-layer feedforward NN, while for CIFAR10, we train a Res Net9 Page (2019).
Hardware Specification Yes The total time to run all experiments including hyperparameter tuning on a single NVIDIA-Ge Force-RTX-3090 is 2.5 weeks and the code is provided here.
Software Dependencies No No specific software dependencies with version numbers are explicitly mentioned in the provided text.
Experiment Setup Yes For MNIST, we train a 2-layer feedforward NN, while for CIFAR10, we train a Res Net9 Page (2019). We train Rotated MNIST, Inverted MNIST, Rotated CIFAR10, and Label CIFAR10 for 250, 280, 2400 and 2400 iterations respectively with 2 refine steps for SR-FCA. [...] For FEMNIST, train a CNN for while for Shakespeare we train a 2-layer stacked LSTM. For clustered FL baselines, we tune K, the number of clusters, with K {2,3,4,5} for FEMNIST and K {1,2,3,4} . We run FEMNIST and Shakespeare for 1000 and 2400 iterations respectively and set number of refine steps to be 1 for SR-FCA. [...] For SR-FCA, we tune the parameters λ and β for trimmed mean and set t = 2 and require at most 2 REFINE steps. [...] As our global model corresponds to Fed Avg, for a fair comparison, we use Fed Avg Mc Mahan & Ramage (2017) inside the Trimmed Mean GD subroutine, bt applying Tr Meanβ operation to the model updates after local steps from different clients.