Adaptive Gradient Clipping for Robust Federated Learning

Authors: Youssef Allouah, Rachid Guerraoui, Nirupam Gupta, Ahmed Jellouli, Geovani Rizk, John Stephan

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on benchmark image classification tasks confirm these theoretical insights, demonstrating that ARC significantly enhances robustness, particularly in highly heterogeneous and adversarial settings.
Researcher Affiliation Academia 1EPFL, Switzerland 2University of Copenhagen, Denmark Correspondence to: EMAIL
Pseudocode Yes Algorithm 1 Robust Distributed Gradient Descent (Robust-DGD) ... Algorithm 2 Adaptive Robust Clipping (ARC) ... Algorithm 3 Robust Distributed Stochastic Gradient Descent (Robust-DSGD)
Open Source Code No The paper does not provide an explicit statement about releasing code for the methodology described, nor does it include a direct link to a code repository.
Open Datasets Yes We conduct experiments on MNIST (Deng, 2012), Fashion MNIST (Xiao et al., 2017), and CIFAR-10 (Krizhevsky et al., 2014)
Dataset Splits No The paper describes how data heterogeneity is simulated by distributing data among workers using a Dirichlet distribution of parameter α and mentions data normalization and augmentation, but it does not explicitly state the training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or other computer specifications used for running the experiments.
Software Dependencies No The paper mentions 'PyTorch' in Section 5.2 regarding model initialization, but it does not specify a version number or list other software dependencies with their versions.
Experiment Setup Yes On MNIST and Fashion-MNIST, we train a convolutional neural network (CNN) of 431,080 parameters with batch size b = 25, T = 1000, γ = 0.1, and momentum parameter β = 0.9. Moreover, the negative log likelihood (NLL) loss function is used, along with an ℓ2-regularization of 10-4. On CIFAR-10, we train a CNN of 1,310,922 parameters. We set b = 50, T = 2000, β = 0.9, and γ = 0.05 decaying once at step 1500. Finally, we use the NLL loss function with an ℓ2 regularization of 10-2. ... The comprehensive experimental setup and the architecture of the models are presented in Table 2.