Straggler-Resilient Personalized Federated Learning
Authors: Isidoros Tziotis, Zebang Shen, Ramtin Pedarsani, Hamed Hassani, Aryan Mokhtari
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results support our theoretical findings showing the superiority of our method over alternative personalized federated schemes in system and data heterogeneous environments. |
| Researcher Affiliation | Academia | Isidoros Tziotis EMAIL The University of Texas at Austin; Zebang Shen EMAIL ETH Zurich; Ramtin Pedarsani EMAIL The University of California, Santa Barbara; Hamed Hassani EMAIL The University of Pennsylvania; Aryan Mokhtari EMAIL The University of Texas at Austin |
| Pseudocode | Yes | Algorithm 1 SRPFL; Algorithm 2 Fed Rep-SRPFL (Linear Representation) |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | Experiments on various datasets (CIFAR10, CIFAR100, EMNIST, FEMNIST, Sent140) support our theoretical results showing that: (i) SRPFL significantly boosts the performance of different subroutines designed for personalized FL both in full and partial participation settings and (ii) SRPFL exhibits superior performance in system and data heterogeneous settings compared to state-of-the-art baselines. We include five datasets in our empirical study: CIFAR10, CIFAR100, EMNIST, FEMNIST and Sent140 datasets. |
| Dataset Splits | No | To ensure that our data allocation is heterogeneous we randomly split the data points among the clients in a way that each client can only observe a specific subset of classes. For instance, in the CIFAR10 dataset where there are in total 10 different classes of data points, each client is only assigned data from 5 different classes, which we refer to as Shards; see first column of Figure 3. We also make sure that the test set for each client is consistent with the samples they have access to at training time, e.g., if client i only observes samples with labels 1, 4, 5 during training, then at test time they are only asked to classify samples from the same classes. The paper details data allocation per client and test set consistency but does not specify global or local train/validation/test split percentages or exact counts. |
| Hardware Specification | No | The paper discusses the computational power of client devices in the federated learning framework but does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experimental simulations. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1) needed to replicate the experiment. |
| Experiment Setup | Yes | Hyperparameters and choice of models.We set the hyperparameters following the work of (Collins et al., 2021). Specifically, for the implementation of Fed Rep and Fed Rep-SRPFL we use SGD with momentum where the momentum parameter is set to 0.5 and the local learning rate to 0.1. Further, similarly to (Collins et al., 2021) we set the local learning rate to 0.1 for all other methods under consideration, which obtains optimal performance. We fix the batch size to 10 for all our implementations. The number of local epochs is set to 1 for CIFAR10 with N = 100 and to 5 for the rest of the datasets. In terms of the choice of the neural network model, for CIFAR10, we use Le Net-5 including two convolution layers with (64, 64) channels and three fully connected layers where the numbers of hidden neurons are (120, 64). The same structure is used for CIFAR100, but the numbers of channels in the convolution layers are increased to (64, 128) and the numbers of hidden neurons are increased to (256, 128). Additionally, a dropout layer with parameter 0.6 is added after the first two fully connected layers, which improves the testing accuracy. For EMNIST and FEMNIST, we use MLP with three hidden layers with (512, 256, 64) hidden neurons. For Sent140 we use two-layer bidirectional LSTM with dimension 256 and dropout rate 0.5 followed by a fully connected layer of dimensions 5 and a classification head. Further, we use the standard glove embedding of dimension 100 and vocabulary of size 10000. For the needs of SRFRL, we split the neural network model into two parts, the customized head hi and the common representation φ. In our experiments, we simply take the customized head to be the last hidden layer and the rest of the parameters are treated as the common representation. Note that LG-Fed Avg and LG-FLANP have a different head/representation split scheme and the head is globally shared across all clients while a local version of the representation is maintained on every client. For all included datasets, i.e. CIFAR10, CIFAR100, EMNIST, FEMNIST and Sent140 the common head include the last two fully connected layers and the rest of the layers are treated as the representation part. |