Federated Automatic Differentiation

Authors: Keith Rush, Zachary Charles, Zachary Garrett

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply the resulting methods to a variety of benchmark FL tasks. We find that the resulting method can exhibit significantly better convergence properties than static Fed Opt, and its ability to dynamically adjust the hyperparameters over time plays a crucial role in this behavior. We now present an empirical exploration of federated hypergradient descent as applied to server optimization hyperparameters.
Researcher Affiliation Industry Keith Rush EMAIL Google Research Seattle, WA, USA Zachary Charles EMAIL Google Research Seattle, WA, USA Zachary Garrett EMAIL Google Research Seattle, WA, USA
Pseudocode Yes Algorithm 1 Fed Opt ... Algorithm 2 Fed Opt with hypergradient descent.
Open Source Code Yes Since the initial version of this work appeared, we have developed an open-source library that implements federated AD in JAX. See (Rush et al., 2024) for details. ... Since the initial version of this work, we have developed a high-performance implementation of federated AD in JAX; see (Rush et al., 2024).
Open Datasets Yes We used four data sets: CIFAR-100 (Krizhevsky, 2009), EMNIST (Cohen et al., 2017), Shakespeare (Mc Mahan et al., 2017), and Stack Overflow (Authors, 2019). ... We use the Synthetic(α, β) task proposed by Li et al. (2020).
Dataset Splits Yes Throughout, we compare the accuracy of the learned model on the test split of each of the data sets above. ... We sample 50 clients uniformly at random at each communication round. ... We focused on sampling clients for both purposes from a single population.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments. It mentions federated systems generally but no particular GPU/CPU models or infrastructure.
Software Dependencies No The development of ML-focused AD frameworks, such as JAX (Bradbury et al., 2018), TensorFlow (Abadi et al., 2016), and PyTorch (Paszke et al., 2019), has accelerated this... ... we have developed an open-source library that implements federated AD in JAX. ... We instantiated these parameterizations with hand-differentiated pairs of functions representing server optimizer updates implemented in TensorFlow. No specific version numbers for these software components are provided.
Experiment Setup Yes We use E = 1 epochs of mini-batch SGD in Client Update throughout, and use the same batch sizes for each task as in (Charles et al., 2021). We set the per-client weights pk to the number of examples in each client s data set (example-weighting). We sample 50 clients uniformly at random at each communication round. ... For random initialization, learning rates were chosen via a log-uniform distribution from the range (10-3, 10). Momentum values were chosen uniformly from the range (0, 1). For default server optimizer settings, we used a learning rate of 1.0 and a momentum value of 0.9. ... Throughout, our hypergradient descent optimizer is SGD with a learning rate of 0.01. ... Last, we use Adam (Kingma and Ba, 2015) in our hypergradient descent step (rather than SGD), with learning rate 0.01.