FSL-SAGE: Accelerating Federated Split Learning via Smashed Activation Gradient Estimation

Authors: Srijith Nair, Michael Lin, Peizhong Ju, Amirreza Talebi, Elizabeth Serena Bentley, Jia Liu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experimental Results and Discussion In this section, we conduct numerical experiments to verify the efficacy of our proposed FSL-SAGE algorithm... Our empirical results also verify that it outperforms existing state-of-the-art FSL methods, offering both communication efficiency and accuracy. We conducted extensive experiments using large Res Net18 and GPT2-medium models to verify our theoretical results and demonstrate that while being much more communication efficient than existing state-of-the-art FL/SL algorithms, the accuracy of FSL-SAGE is either on-par or even better.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, The Ohio State University, Columbus, Ohio, USA 2Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA 3Department of Industrial Engineering, The Ohio State University, Columbus, Ohio, USA 4Air Force Research Laboratory, Rome, New York, USA. Correspondence to: Srijith Nair <EMAIL>, Jia Liu <EMAIL>.
Pseudocode Yes Algorithm 1 The Code of Client i in FSL-SAGE. Algorithm 2 FSL-SAGE F-Server Algorithm 3 (Lazy) FSL-SAGE S-Server
Open Source Code Yes Our source code is available at https://github.com/srijith1996/FSL-SAGE.
Open Datasets Yes 1-b) Datasets: Although FL generally applies to a wide range of machine learning tasks, we focus on two tasks: 1) image classification on CIFAR-10 and CIFAR-100 datasets (Krizhevsky et al., 2009); and 2) natural language generation using the E2E (Novikova et al., 2017) dataset.
Dataset Splits No The paper describes how data is distributed among clients for heterogeneity using a Dirichlet distribution (Hsu et al., 2019), but it does not explicitly state the train/test/validation split percentages or absolute counts for the datasets used in their experiments. It implies the use of standard datasets (CIFAR-10, CIFAR-100, E2E) but does not provide details on how these were split for their specific experimental setup.
Hardware Specification Yes 1-a) Compute and Baselines: We compare FSL-SAGE with Fed Avg (Mc Mahan et al., 2016), Split Fedv1 and Split Fedv2 (Thapa et al., 2022), and CSEFSL (Mu & Shen, 2023). We use Py Torch for training on a single NVIDIA H100 NVL GPU with 80GB of memory.
Software Dependencies No The paper mentions using "Py Torch for training" but does not specify the version number for PyTorch or any other software dependencies with version numbers.
Experiment Setup Yes 1-d) Hyperparameters: For image classification, we use a batch-size of 256. Clients train their models for 1 epoch on their local dataset per federated averaging round. For CSEFSL and FSL-SAGE, the cut-layer features are sent to the server-side model every 5 local steps, and for FSL-SAGE, the auxiliary models are aligned with the server every l = 10 rounds. We stop training when the communication cost incurred exceeds 200 Gi B. The client-side and server-side models are optimized using Adam (Kingma & Ba, 2015) with a learning rate 10 3, weight decay 10 4, and β1 = 0.9, β2 = 0.999. For alignment, we use the same optimizer settings with no weight decay.