CoCoFL: Communication- and Computation-Aware Federated Learning via Partial NN Freezing and Quantization
Authors: Kilian Pfeiffer, Martin Rapp, Ramin Khalili, Joerg Henkel
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Experimental Evaluation Partial quantization of NN models results in hardware-specific gains in execution time and memory. Hence, our evaluation follows a hybrid approach, where we profile on-device training loops on real hardware and take the profiling information to perform simulations of distributed systems. This allows for the evaluation of large systems with hundreds or thousands of devices. |
| Researcher Affiliation | Collaboration | Kilian Pfeiffer EMAIL Karlsruhe Institute of Technology Martin Rapp EMAIL Karlsruhe Institute of Technology Ramin Khalili EMAIL Huawei Research Center Munich Jörg Henkel EMAIL Karlsruhe Institute of Technology |
| Pseudocode | Yes | Algorithm 1 Each Selected Device c (Client) in Each Round ... Algorithm 2 FL Server (Synchronization and Aggregation) |
| Open Source Code | Yes | 1The code is available at https://github.com/k1l1/Co Co FL. |
| Open Datasets | Yes | For each experiment, we distribute the data from the datasets CIFAR10/100 (Krizhevsky & Hinton, 2009), FEMNIST (Cohen et al., 2017), CINIC10 (Darlow et al., 2018), XChest (Wang et al., 2017), IMDB (Maas et al., 2011), and Shakespeare (Caldas et al., 2019) to devices in C. |
| Dataset Splits | No | The paper describes how data is distributed among clients (iid, non-iid, rc-non-iid) and grouped for experiments (strong, medium, weak devices). It specifies how the overall dataset is distributed to clients and their capabilities (e.g., 'randomly distributed to all devices' for iid, 'number of samples per class varies between devices' for non-iid, 'medium devices has 2/3 of the computational and memory resources', 'weak devices has 1/3'). However, it does not explicitly provide traditional train/validation/test splits for the datasets themselves, nor specific file names or URLs for custom splits, or details about stratified splitting or cross-validation for the overarching dataset evaluation, beyond how data is partitioned across federated learning clients. |
| Hardware Specification | Yes | We employ two different hardware platforms to factor out potential microarchitecture-dependent peculiarities w.r.t. quantization or freezing: x64 AMD Ryzen 7 and a Raspberry Pi with an ARMv8 CPU. |
| Software Dependencies | Yes | We implement the presented training scheme in Py Torch 1.10 (Paszke et al., 2019), which supports int8 quantization. |
| Experiment Setup | Yes | We train with the optimizer SGD with an initial learning rate η of 0.1. For a fair comparison we do not use momentum, as Fj ORD is incompatible with a stateful optimizer. The remaining NN-specific hyperparameters, learning rate decay, and weight decay are given in Table 1. ... A mini-batch size of 32 is used for all experiments. |