Federated Learning for Feature Generalization with Convex Constraints
Authors: Dongwon Kim, Donghee Kim, Sung Kuk Shyn, Kwangsu Kim
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments using the CIFAR-10 and CIFAR100 datasets (Krizhevsky et al., 2009). Our experiments spanned both cross-silo and cross-device settings. For the cross-silo setup, we involved a total of 10 clients, while in the cross-device setting, 10% of clients were randomly selected from a pool of 50 or 100 participants. The data distribution among clients was governed by a Dirichlet distribution, with the α value determining the degree of heterogeneity; a lower α value corresponds to a more heterogeneous distribution. For an extremely heterogeneous environment, we used a Dirichlet α of 0.2 with 10 local training epochs, while a more typical environment utilized an alpha of 0.5 with 5 local training epochs. Our tests were conducted on both the Le Net-5 and Res Net-18 architectures.More detailed settings of experiments are in the supplementary materials. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, University of Sungkyunkwan, Suwon, Korea 2Kim Jaechul Graduate School of AI, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea. Correspondence to: Dongwon Kim <EMAIL>, Kwangsu Kim <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Training procedure of Fed CONST Input: Batch size B, communication rounds K, number of clients M, local steps T, dataset D = S m [M] Dm Output: Global model parameters w K Server executes: Initialize w0 with He Initialization for k = 0, . . . , K 1 do for m = 1, . . . , M in parallel do Send wk to client m wk+1 m Fed CONST: Client executes(m, wk) end return w K Fed CONST: Client executes(m, wk): Assign global model to the local model wk m wk for each local epoch t = 1, . . . , T do for batch (xm,1:B, ym,1:B) Dm do Per layer l and channel/feature c, Center gradient: gk m,t C(gk m,t) Project gradient: gk m,t Pwk(gk m,t) Apply update: wk m wk m ηgk m,t end end return wk+1 m to server |
| Open Source Code | No | No explicit statement or link to open-source code for the methodology described in this paper is provided. The paper discusses its method and experiments but does not offer access to its implementation. |
| Open Datasets | Yes | We conducted experiments using the CIFAR-10 and CIFAR100 datasets (Krizhevsky et al., 2009). |
| Dataset Splits | Yes | For the evaluation of test loss on client data, we partitioned the data such that 10% of each client s data was reserved as local test data. |
| Hardware Specification | Yes | All experimental evaluations were executed utilizing two Nvidia 3090 GPUs. |
| Software Dependencies | No | The paper lists specific hyperparameters for various algorithms (MOON, Fed Prox, Fed Dyn, Fed SAM) and general training parameters (local learning momentum, weight decay, batch size, learning rate), but does not provide specific software names with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | Hyperparameters. In our experiments, we configured various algorithms with specific hyperparameters: MOON: µ = 0.01, Temperature = 1 Fed Prox: µ = 0.01 Fed Dyn: α = 1 Fed SAM: ρ = 1.0 Model Configuration. We employed both the Res Net-18 and Len Net-5 architectures for our experiments. When applying our constraints, we removed the batch normalization layer to leverage the weight normalization (WN) effect. Additionally, biases were omitted from the models in our experiments, as they had only a minor effect on the overall model performance. Other Experimental Settings. For the training parameters, we set the local learning momentum to 0.9, applied a weight decay of 1e-5, and used a batch size of 50. The learning rate was set to 0.01 for local training and 1.0 for global updates. All experimental evaluations were executed utilizing two Nvidia 3090 GPUs. |