reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Data Leakage in Federated Averaging

Authors: Dimitar Iliev Dimitrov, Mislav Balunovic, Nikola Konstantinov, Martin Vechev

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present an experimental evaluation of our proposed attack and various baselines. We conduct our experiments on two image classiﬁcation datasets. One is FEMNIST... The other dataset is CIFAR100... The results indicate that our method can recover a signiﬁcant portion of input images, as well as accurate estimates of class frequencies, outperforming previously developed attacks. Further, we perform several ablation studies to understand the impact that individual components of our attack have on overall attack performance.
Researcher Affiliation	Academia	Dimitar I. Dimitrov to: EMAIL Department of Computer Science ETH Zurich Mislav Balunović EMAIL Department of Computer Science ETH Zurich Nikola Konstantinov EMAIL ETH AI Center Department of Computer Science ETH Zurich Martin Vechev EMAIL Department of Computer Science ETH Zurich
Pseudocode	Yes	Algorithm 1 outlines the Fed Avg s client update algorithm for a selected client c at some communication round. Algorithm 2 Overview of our attack. Algorithm 3 Overview of our matching and averaging algorithm.
Open Source Code	Yes	An implementation of this algorithm, that successfully attacks realistic Fed Avg updates with multiple epochs and batches per epoch. Code is available at https://github.com/eth-sri/fedavg_leakage.
Open Datasets	Yes	We conduct our experiments on two image classiﬁcation datasets. One is FEMNIST, part of the commonly used federated learning framework LEAF (Caldas et al., 2018). The other dataset is CIFAR100 (Krizhevsky et al., 2009).
Dataset Splits	Yes	We evaluate with 100 random clients from the training set and select N c = 50 data points from each. The other dataset is CIFAR100 (Krizhevsky et al., 2009), which consists of 32 32 images partitioned into 100 classes. Here, we simply sample 100 batches of size N c = 50 from the training dataset to form the individual clients data.
Hardware Specification	Yes	We ran all of our experiments on a single NVIDIA RTX 2080 Ti GPU.
Software Dependencies	No	The gradients ℓ e Xc , required by SGD and Adam, are calculated via an automatic diﬀerentiation tool in our case JAX (Bradbury et al., 2018). We use the linear sum assignment problem solver Lin Assign provided by Sci Py (Virtanen et al., 2020). The paper mentions JAX and SciPy as software tools used but does not provide specific version numbers for them, which is required for reproducibility.
Experiment Setup	Yes	In all experiments, we use total variation (TV) (Estrela et al., 2016) as an additional regularizer... We balance them with the rest of our reconustruction loss ℓusing the hyperparameters λTV and λclip, respectively. Additionally, we use diﬀerent learning rates ηrec and learning rate decay factors γrec for solving the optimization problem in Algorithm 2. For both datasets we use 200 optimization steps and client learning rate η = 0.004... FEMNIST We use the following hyperparameters for our FEMNIST experiments λTV = 0.001, λclip = 2, λinv = 1000, ηrec = 0.4, and exponential learning rate decay γrec = 0.995 applied every 10 steps. CIFAR100 We use the following hyperparameters for our CIFAR100 experiments λTV = 0.0002, λclip = 10, λinv = 6.075, ηrec = 0.1, and exponential learning rate decay γrec = 0.997 applied every 20 steps.