Adaptive Gradient Clipping for Robust Federated Learning
Authors: Youssef Allouah, Rachid Guerraoui, Nirupam Gupta, Ahmed Jellouli, Geovani Rizk, John Stephan
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on benchmark image classification tasks confirm these theoretical insights, demonstrating that ARC significantly enhances robustness, particularly in highly heterogeneous and adversarial settings. |
| Researcher Affiliation | Academia | 1EPFL, Switzerland 2University of Copenhagen, Denmark Correspondence to: EMAIL |
| Pseudocode | Yes | Algorithm 1 Robust Distributed Gradient Descent (Robust-DGD) ... Algorithm 2 Adaptive Robust Clipping (ARC) ... Algorithm 3 Robust Distributed Stochastic Gradient Descent (Robust-DSGD) |
| Open Source Code | No | The paper does not provide an explicit statement about releasing code for the methodology described, nor does it include a direct link to a code repository. |
| Open Datasets | Yes | We conduct experiments on MNIST (Deng, 2012), Fashion MNIST (Xiao et al., 2017), and CIFAR-10 (Krizhevsky et al., 2014) |
| Dataset Splits | No | The paper describes how data heterogeneity is simulated by distributing data among workers using a Dirichlet distribution of parameter α and mentions data normalization and augmentation, but it does not explicitly state the training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or other computer specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions 'PyTorch' in Section 5.2 regarding model initialization, but it does not specify a version number or list other software dependencies with their versions. |
| Experiment Setup | Yes | On MNIST and Fashion-MNIST, we train a convolutional neural network (CNN) of 431,080 parameters with batch size b = 25, T = 1000, γ = 0.1, and momentum parameter β = 0.9. Moreover, the negative log likelihood (NLL) loss function is used, along with an ℓ2-regularization of 10-4. On CIFAR-10, we train a CNN of 1,310,922 parameters. We set b = 50, T = 2000, β = 0.9, and γ = 0.05 decaying once at step 1500. Finally, we use the NLL loss function with an ℓ2 regularization of 10-2. ... The comprehensive experimental setup and the architecture of the models are presented in Table 2. |