reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Partially Personalized Federated Learning: Breaking the Curse of Data Heterogeneity

Authors: Konstantin Mishchenko, Rustem Islamov, Eduard Gorbunov, Samuel Horváth

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiments The detailed description of all experimental setups is deferred to the Appendix E. Table 2: Test accuracy across different model variants and datasets. For Fed Alt and Fed Sim, we report the numbers for the best performing option in each experiment. Figure 1: (a): convergence of FFGG varying the number of local steps of conjugate gradients solver τ; (b-d): comparison of FFGG against Scaffold, Local GD, and L2GD varying the number of local gradient steps in each method (denoted as τ for all methods).
Researcher Affiliation	Collaboration	Konstantin Mishchenko EMAIL Meta, France Rustem Islamov University of Basel, Switzerland Eduard Gorbunov Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates Samuel Horváth Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates
Pseudocode	Yes	Algorithm 1 Fine-tuning Followed by Global Gradient (FFGG) ... Algorithm 2 Local GD fine-tuner ... Algorithm 3 Asynchronous FFGG ... Algorithm 4 Local FFGG
Open Source Code	Yes	Our implementation for this section is available at https://github.com/Rustem-Islamov/FL_representations.
Open Datasets	Yes	FEMNIST (character recognition), GLDv2 (Visual Landmark Recognition), and Stack Overflow (next word prediction)... FEMNIST (Cohen et al., 2017)... Google Landmarks Dataset v2 (GLDv2) (Weyand et al., 2020)... federated version of the GLDv2 dataset by (Hsu et al., 2020)... Stack Overflow dataset made available by Tensor Flow Federated... ImageNet dataset (Deng et al., 2009).
Dataset Splits	Yes	GLDv2: Rather, we allocate 50% of the data from each client to be used as a test set. EMNIST: We specifically consider clients that have a minimum of 100 training points and 25 testing points. Stack Overflow: we only include clients with a minimum of 100 training sequences and 10 testing sequences... we consider a maximum of 1000 training sequences per client.
Hardware Specification	No	The paper mentions "execution on CPUs" and using the "Ray package (Moritz et al., 2018) to parallelize the execution" but does not specify any particular CPU models, processor types, or memory details.
Software Dependencies	No	The paper mentions several software components like "SciPy’s (Virtanen et al., 2020)", "Ray package (Moritz et al., 2018)", "Tensor Flow Federated", and "PyTorch" but does not provide specific version numbers for any of them.
Experiment Setup	Yes	E.7 Hyperparameters and evaluation details The hyperparameters we use are given in Table 3. Table 3: Hyperparameters for each dataset/task. Batch size, Devices per round, Local epochs, Server optimizer, Client optimizer, Global scheduler, Warm-up, LR decay rounds, Max. grad. norm., # Rounds, Server learning rate, Client learning rate are listed.