Distributed Architecture Search Over Heterogeneous Distributions

Authors: Erum Mushtaq, Chaoyang He, Jie Ding, Salman Avestimehr

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the performance of the proposed algorithm, we consider a cross-silo FL setting and use the Dirichlet Distribution to create a non-I.I.D data distribution across clients. For evaluation, we report the test accuracy at each client using the 20% of training data kept as test data for that client. Furthermore, we compare our work to the state-of-the-art predefined architecture-based personalization FL schemes. We demonstrate that the architecture personalization yields better results than state-of-the-art personalization algorithms based solely on the optimization layer, such as Ditto (Li et al., 2021), per Fed Avg (Fallah et al., 2020), local adaptation (Cheng et al., 2021) and KNN-Per (Marfoq et al., 2022).
Researcher Affiliation Collaboration Erum Mushtaq EMAIL Department of Electrical and Computer Engineering University of Southern California Chaoyang He EMAIL Fed ML, Inc Jie Ding EMAIL School of Statistics University of Minnesota Salman Avestimehr EMAIL Department of Electrical and Computer Engineering University of Southern California
Pseudocode Yes Algorithm 1 SPIDER Trainer Algorithm 2 SPIDER-Searcher
Open Source Code Yes Code is available at https://github.com/Erum Mushtaq/SPIDER.git .
Open Datasets Yes We perform an image classification task on three well-known datasets, CIFAR10, CIFAR100 and CINIC10. CIFAR10 dataset (Krizhevsky et al., 2009) consists of 60000 32x32 color images in 10 classes, with 6000 images per class, and CIFAR100 dataset (Krizhevsky et al., 2009) consists of 60,000 images in 100 classes, with 600 images per class. ... CINIC10 dataset (Darlow et al., 2018) is a larger dataset and includes images from Image Net as well.
Dataset Splits Yes Since we need validation data for SPIDER-Searcher, we split the total training data samples present at each client into training (60%), validation (20%), and testing sets (20%). We use this 60/20/20% train/valid/test split during Phase 1 and 2 of the SPIDER Training. Once each client has selected the architecture, they combine the validation data with training data in Phase 3 and use it for training. The test set remains the same throughout the training. For all other personalization schemes used for comparison, we split the data samples of each client with 80 % training and 20 % test for a fair comparison.
Hardware Specification Yes We implement the proposed method for distributed computing with nine nodes, each equipped with an NVIDIA RTX 2080Ti GPU card. ... For implementation, we used 8 GPU NVIDIA RTX 2080Ti GPU cards where each GPU represents a physical node with an NVIDIA RTX 2080Ti GPU card on the fedml platform implementation for FL (He et al., 2020c).
Software Dependencies No The paper mentions 'fedml platform implementation for FL (He et al., 2020c)' and 'stochastic gradient descent (SGD) optimizer', but does not provide specific version numbers for any software libraries, frameworks, or languages used.
Experiment Setup Yes For empirical results of CIFAR10, CIFAR100, and CINIC10, we use a batch size of 32 for all our experiments. We use LDA distribution with a 0.2 α parameter value. ... For SPIDER, we use the first 30 rounds as warmup rounds, for SPIDER-Searcher, we use a recovery period of 20. Furthermore, we use a learning rate in the search range of {0.01, 0.03} for SPIDER. For SPIDER, we used λ search from the set of {0.01, 0.1, 1}. For the other personalized schemes such as Ditto, per Fed Avg, KNN-Per, Fed MN, and local adaptation with Resnet18, we searched learning rate over the set {0.1, 0.3, 0.01, 0.03, 0.001, 0.003}. ... We used stochastic gradient descent (SGD) optimizer for all the methods.