reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Provably Near-Optimal Federated Ensemble Distillation with Negligible Overhead

Authors: Won-Jun Jang, Hyeon-Seo Park, Si-Hyeon Lee

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on various image classification tasks demonstrate that the proposed method significantly outperforms baselines. Furthermore, we show that the additional communication cost, client-side privacy leakage, and client-side computational overhead introduced by our method are negligible, both in scenarios with and without a pre-existing server dataset.
Researcher Affiliation	Academia	1School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea. Correspondence to: Si-Hyeon Lee <EMAIL>.
Pseudocode	Yes	Algorithm 1 Federated learning with K clients for T communication rounds, with ensemble distillation exploiting unlabeled dataset on the server. Algorithm 2 Fed GO algorithm with K clients for T communication rounds. Algorithm 3 Discriminator update for Ed epochs.
Open Source Code	Yes	For ease of reproduction, our code is open-sourced (https://github.com/pupiu45/Fed GO).
Open Datasets	Yes	We employed datasets CIFAR-10/100 (Krizhevsky, 2009) (MIT license) and downsampled Image Net100 (Image Net100 dataset; Chrabaszcz et al., 2017).
Dataset Splits	Yes	Unless specified otherwise, the entire client dataset corresponds to half of the specified client dataset (half for each class), and each client dataset is sampled from the entire client dataset according to Dirichlet(α), akin to setups in Lin et al. (2020); Cho et al. (2022). α is set to 0.1 and 0.05 to represent data-heterogeneous scenarios. The server dataset corresponds to half of the specified server dataset (half for each class) without labels. ... Table 3. Server test accuracy (%) of our Fed GO and baselines on three image datasets at the 100-th communication round.
Hardware Specification	Yes	All experiments were conducted in Python 3.8.12 environment using a 64-core Intel 2.90GHz Xeon Gold 6226R CPU with 512GB memory, and an RTX 3090 GPU.
Software Dependencies	Yes	All experiments were conducted in Python 3.8.12 environment using a 64-core Intel 2.90GHz Xeon Gold 6226R CPU with 512GB memory, and an RTX 3090 GPU. We also implemented the algorithms using Py Torch with version 1.11.0.
Experiment Setup	Yes	During the ensemble distillation process, we trained both clients and server with the Adam optimizer (Kingma & Ba, 2015) at a learning rate of 0.001 with batch size 64, without weight decay. The (β1, β2) parameters for Adam were set to (0.9, 0.999). Additionally, we applied cosine annealing (Loshchilov & Hutter, 2022) to decay the server learning rate until the final communication round T = 100 as in Lin et al. (2020), except for the results of F.3 and F.5. For the client and server classifier training epochs, we performed a grid search to find the optimal number of training epochs. The initial grid was {5, 10, 30, 50}, and the experiments were conducted with 30 client epochs and 10 server epochs (Es = 10) for CIFAR-10/100. To leverage the increased number of steps due to the additional number of data, experiments on Image Net100 were conducted with 10 client classifier epochs and 3 server classifier epochs (Es = 3).