Federated Learning on Virtual Heterogeneous Data with Local-Global Dataset Distillation
Authors: Chun-Yin Huang, Ruinan Jin, Can Zhao, Daguang Xu, Xiaoxiao Li
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment on both benchmark and real-world datasets that contain heterogeneous data from different sources, and further scale up to an FL scenario that contains a large number of clients with heterogeneous and classimbalanced data. Our method outperforms state-of-the-art heterogeneous FL algorithms under various settings. |
| Researcher Affiliation | Collaboration | Chun-Yin Huang EMAIL University of British Columbia Vector Institute Ruinan Jin EMAIL University of British Columbia Vector Institute Can Zhao EMAIL NVIDIA Daguang Xu EMAIL NVIDIA Xiaoxiao Li EMAIL University of British Columbia Vector Institute |
| Pseudocode | Yes | Algorithm 1 Federated Virtual Learning with Local-global Distillation |
| Open Source Code | Yes | Our code is available at https://github.com/ubc-tea/Fed LGD. |
| Open Datasets | Yes | We use the following datasets for our benchmark experiments: DIGITS = {MNIST (Le Cun et al., 1998), SVHN (Netzer et al., 2011), USPS (Hull, 1994), Synth Digits (Ganin & Lempitsky, 2015), MNIST-M (Ganin & Lempitsky, 2015)}. We conduct large-scale FL experiments on CIFAR10C1, where, like previous studies (Li et al., 2021), we apply Dirichlet distribution with = 2 to generate 3 partitions on each distorted Cifar10C (Hendrycks & Dietterich, 2019), resulting in 57 domain and label heterogeneous non-IID clients. For medical dataset, we use the retina image datasets, RETINA = {Drishti (Sivaswamy et al., 2014), Acrima (Diaz-Pinto et al., 2019), Rim (Batista et al., 2020), Refuge (Orlando et al., 2020)}, where each dataset contains retina images from different stations with image size 96 96, thus forming four clients in FL. |
| Dataset Splits | Yes | We use the SGD optimizer to update local models. If not specified, our default setting for learning rate is 10 2, local model update epochs is 1, total update rounds is 100, the batch size for local training is 32, and the number of virtual data update iterations (| |) is 10. we apply Dirichlet distribution with = 2 to generate 3 partitions on each distorted Cifar10C (Hendrycks & Dietterich, 2019), resulting in 57 domain and label heterogeneous non-IID clients. In addition, we randomly sample a fraction of clients with ratio = 0.2, 0.5, and 1 for each FL round. We use local virtual data from our initialization stage for FL methods other than ours and perform classification on client s testing set and report the test accuracies. |
| Hardware Specification | No | No specific hardware details are provided for running the experiments. The acknowledgement mentions 'NVIDIA Hardware Award' but does not specify the hardware used for the reported experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers are mentioned in the paper. |
| Experiment Setup | Yes | We use the SGD optimizer to update local models. If not specified, our default setting for learning rate is 10 2, local model update epochs is 1, total update rounds is 100, the batch size for local training is 32, and the number of virtual data update iterations (| |) is 10. The numbers of default virtual data distillation steps for clients and server are set to 100 and 500, respectively. We use IPC œ {10, 50} and arch œ { Res Net18(R), Conv Net(C)} to examine the performance of SOTA models and Fed LGD using distilled DIGITS. Note that we fix IPC = 10 for global virtual data and vary IPC for local virtual data. The learning rate is set to 10 3. |