reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adaptive Self-Distillation for Minimizing Client Drift in Heterogeneous Federated Learning

Authors: M Yashwanth, Gaurav Kumar Nayak, Arya Singh, Yogesh Simmhan, Anirban Chakraborty

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the efficacy of our approach through extensive experiments on multiple real-world benchmarks and show substantial gains in performance when the proposed regularizer is combined with popular FL methods. The link to the code is https://github.com/vcl-iisc/fed-adaptive-self-distillation.
Researcher Affiliation	Academia	M.Yashwanth EMAIL Indian Institute of Science Gaurav Kumar Nayak EMAIL Indian Institute of Technology (IIT) Roorkee Arya Singh EMAIL Indian Institute of Science Yogesh Simmhan EMAIL Indian Institute of Science Anirban Chakraborty EMAIL Indian Institute of Science
Pseudocode	No	The paper describes the proposed method through mathematical equations and textual explanations, for example, 'We now describe the proposed method where each client k minimizes the fk(w) as defined below Eq. (2). fk(w) Lk(w) + λLASD k (w) (2)'. There are no clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The link to the code is https://github.com/vcl-iisc/fed-adaptive-self-distillation.
Open Datasets	Yes	We perform the experiments on CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009), Tiny-Image Net (Le & Yang, 2015) datasets with different degrees of heterogeneity in the balanced settings
Dataset Splits	Yes	For generating non-iid data, Dirichlet distribution is used. To simulate the effect of label imbalance, for every client we sample the probability distribution over the classes from the aforementioned Dirichlet distribution pdir k = Dir(δ, C). ...By configuring the concentration parameter δ to 0.6 and 0.3, we sample the data using the Dirichlet distribution across the labels for each client from moderate to high heterogeneity by controlling δ. We set the total number of clients to 100 in all our experiments. We set the client participation rate to 0.1, i.e., 10 percent of clients are sampled on an average per communication round
Hardware Specification	No	No specific hardware details (GPU/CPU models, processors, memory) used for running the experiments are mentioned in the paper. The paper refers to 'edge devices' in general terms but does not specify the hardware used for their experimental setup.
Software Dependencies	No	No specific version numbers for software libraries or dependencies are provided. The paper mentions 'We build our experiments using publicly available codebase by (Acar et al., 2021)' and 'We use Py Torch style representation.' but does not specify versions for these or other software.
Experiment Setup	Yes	We set the total number of clients to 100 in all our experiments. We set the client participation rate to 0.1, i.e., 10 percent of clients are sampled on an average per communication round...Hyperparameters: SGD algorithm with a learning rate of 0.1 and decay the learning rate per round of 0.998 is used to train the client models. Temperature τ is set to 2.0. We only tune the hyper-parameter λ. More hyperparameter setting details and impact of λ, τ are provided in Sec. A.3 and A.6 of the appendix, respectively. The batch-size (B) of 50 and learning rate of 0.1 with decay of 0.998 is employed for all the experiments unless specified.