reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Federated Learning for Feature Generalization with Convex Constraints

Authors: Dongwon Kim, Donghee Kim, Sung Kuk Shyn, Kwangsu Kim

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted experiments using the CIFAR-10 and CIFAR100 datasets (Krizhevsky et al., 2009). Our experiments spanned both cross-silo and cross-device settings. For the cross-silo setup, we involved a total of 10 clients, while in the cross-device setting, 10% of clients were randomly selected from a pool of 50 or 100 participants. The data distribution among clients was governed by a Dirichlet distribution, with the α value determining the degree of heterogeneity; a lower α value corresponds to a more heterogeneous distribution. For an extremely heterogeneous environment, we used a Dirichlet α of 0.2 with 10 local training epochs, while a more typical environment utilized an alpha of 0.5 with 5 local training epochs. Our tests were conducted on both the Le Net-5 and Res Net-18 architectures.More detailed settings of experiments are in the supplementary materials.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, University of Sungkyunkwan, Suwon, Korea 2Kim Jaechul Graduate School of AI, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea. Correspondence to: Dongwon Kim <EMAIL>, Kwangsu Kim <EMAIL>.
Pseudocode	Yes	Algorithm 1 Training procedure of Fed CONST Input: Batch size B, communication rounds K, number of clients M, local steps T, dataset D = S m [M] Dm Output: Global model parameters w K Server executes: Initialize w0 with He Initialization for k = 0, . . . , K 1 do for m = 1, . . . , M in parallel do Send wk to client m wk+1 m Fed CONST: Client executes(m, wk) end return w K Fed CONST: Client executes(m, wk): Assign global model to the local model wk m wk for each local epoch t = 1, . . . , T do for batch (xm,1:B, ym,1:B) Dm do Per layer l and channel/feature c, Center gradient: gk m,t C(gk m,t) Project gradient: gk m,t Pwk(gk m,t) Apply update: wk m wk m ηgk m,t end end return wk+1 m to server
Open Source Code	No	No explicit statement or link to open-source code for the methodology described in this paper is provided. The paper discusses its method and experiments but does not offer access to its implementation.
Open Datasets	Yes	We conducted experiments using the CIFAR-10 and CIFAR100 datasets (Krizhevsky et al., 2009).
Dataset Splits	Yes	For the evaluation of test loss on client data, we partitioned the data such that 10% of each client s data was reserved as local test data.
Hardware Specification	Yes	All experimental evaluations were executed utilizing two Nvidia 3090 GPUs.
Software Dependencies	No	The paper lists specific hyperparameters for various algorithms (MOON, Fed Prox, Fed Dyn, Fed SAM) and general training parameters (local learning momentum, weight decay, batch size, learning rate), but does not provide specific software names with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	Yes	Hyperparameters. In our experiments, we configured various algorithms with specific hyperparameters: MOON: µ = 0.01, Temperature = 1 Fed Prox: µ = 0.01 Fed Dyn: α = 1 Fed SAM: ρ = 1.0 Model Configuration. We employed both the Res Net-18 and Len Net-5 architectures for our experiments. When applying our constraints, we removed the batch normalization layer to leverage the weight normalization (WN) effect. Additionally, biases were omitted from the models in our experiments, as they had only a minor effect on the overall model performance. Other Experimental Settings. For the training parameters, we set the local learning momentum to 0.9, applied a weight decay of 1e-5, and used a batch size of 50. The learning rate was set to 0.01 for local training and 1.0 for global updates. All experimental evaluations were executed utilizing two Nvidia 3090 GPUs.