Federated Learning on Virtual Heterogeneous Data with Local-Global Dataset Distillation

Authors: Chun-Yin Huang, Ruinan Jin, Can Zhao, Daguang Xu, Xiaoxiao Li

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment on both benchmark and real-world datasets that contain heterogeneous data from different sources, and further scale up to an FL scenario that contains a large number of clients with heterogeneous and classimbalanced data. Our method outperforms state-of-the-art heterogeneous FL algorithms under various settings.
Researcher Affiliation Collaboration Chun-Yin Huang EMAIL University of British Columbia Vector Institute Ruinan Jin EMAIL University of British Columbia Vector Institute Can Zhao EMAIL NVIDIA Daguang Xu EMAIL NVIDIA Xiaoxiao Li EMAIL University of British Columbia Vector Institute
Pseudocode Yes Algorithm 1 Federated Virtual Learning with Local-global Distillation
Open Source Code Yes Our code is available at https://github.com/ubc-tea/Fed LGD.
Open Datasets Yes We use the following datasets for our benchmark experiments: DIGITS = {MNIST (Le Cun et al., 1998), SVHN (Netzer et al., 2011), USPS (Hull, 1994), Synth Digits (Ganin & Lempitsky, 2015), MNIST-M (Ganin & Lempitsky, 2015)}. We conduct large-scale FL experiments on CIFAR10C1, where, like previous studies (Li et al., 2021), we apply Dirichlet distribution with = 2 to generate 3 partitions on each distorted Cifar10C (Hendrycks & Dietterich, 2019), resulting in 57 domain and label heterogeneous non-IID clients. For medical dataset, we use the retina image datasets, RETINA = {Drishti (Sivaswamy et al., 2014), Acrima (Diaz-Pinto et al., 2019), Rim (Batista et al., 2020), Refuge (Orlando et al., 2020)}, where each dataset contains retina images from different stations with image size 96 96, thus forming four clients in FL.
Dataset Splits Yes We use the SGD optimizer to update local models. If not specified, our default setting for learning rate is 10 2, local model update epochs is 1, total update rounds is 100, the batch size for local training is 32, and the number of virtual data update iterations (| |) is 10. we apply Dirichlet distribution with = 2 to generate 3 partitions on each distorted Cifar10C (Hendrycks & Dietterich, 2019), resulting in 57 domain and label heterogeneous non-IID clients. In addition, we randomly sample a fraction of clients with ratio = 0.2, 0.5, and 1 for each FL round. We use local virtual data from our initialization stage for FL methods other than ours and perform classification on client s testing set and report the test accuracies.
Hardware Specification No No specific hardware details are provided for running the experiments. The acknowledgement mentions 'NVIDIA Hardware Award' but does not specify the hardware used for the reported experiments.
Software Dependencies No No specific software dependencies with version numbers are mentioned in the paper.
Experiment Setup Yes We use the SGD optimizer to update local models. If not specified, our default setting for learning rate is 10 2, local model update epochs is 1, total update rounds is 100, the batch size for local training is 32, and the number of virtual data update iterations (| |) is 10. The numbers of default virtual data distillation steps for clients and server are set to 100 and 500, respectively. We use IPC œ {10, 50} and arch œ { Res Net18(R), Conv Net(C)} to examine the performance of SOTA models and Fed LGD using distilled DIGITS. Note that we fix IPC = 10 for global virtual data and vary IPC for local virtual data. The learning rate is set to 10 3.