reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos

Authors: Tianyi Zhang, Yu Cao, Dianbo Liu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across multiple datasets demonstrate that UEFL outperforms state-of-the-art methods, achieving significant improvements in accuracy (by 3% 22.1%) and uncertainty reduction (by 38.83% 96.24%). The source code is available at https://github.com/destiny301/uefl.
Researcher Affiliation	Academia	1University of Minnesota, MN, USA 2National University of Singapore, Singapore. Correspondence to: Tianyi Zhang <EMAIL>, Dianbo Liu <EMAIL>.
Pseudocode	Yes	Algorithm 1 Uncertainty-Based Extensible-Codebook Federated Learning (UEFL)
Open Source Code	Yes	The source code is available at https://github.com/destiny301/uefl.
Open Datasets	Yes	we employ similar technique to introduce feature heterogeneity on five different datasets: MNIST, FMNIST, CIFAR10, GTSRB, and CIFAR100, to validate our framework s robustness. In our experiments, we create three domains by counter-clockwise rotating the datasets by 0 (D1), -50 (D2), and 120 (D3). ... For RGB datasets like GTSRB, CIFAR10, and CIFAR100... And for DG, we adopt a pretrained Res Net18 for both datasets. Initial codebook sizes are set to 32 for MNIST and 64 for the remaining datasets... Besides the regular training with multi-domain data silos, we also test UEFL for domain generalization (DG) task on Rotated MNIST (Ghifary et al., 2015) and PACS (Li et al., 2017) datasets
Dataset Splits	Yes	In our experiments, we create three domains by counter-clockwise rotating the datasets by 0 (D1), -50 (D2), and 120 (D3). We sampled three data silos from each domain (i.e. totally 9 silos), and data silos for CIFAR100 contain 4000 images each, while the other datasets consist of 2000 images per silo. Besides the regular training with multi-domain data silos, we also test UEFL for domain generalization (DG) task on Rotated MNIST (Ghifary et al., 2015) and PACS (Li et al., 2017) datasets, which contains four distinct domains: art painting (A), cartoon (C), photo (P), and sketch (S). ... Specifically, we perform leave-one-domain-out experiments, where we choose one domain as the target domain, train the model on all remaining domains, and evaluate it on the chosen domain. Each source domain is treated as a client.
Hardware Specification	Yes	These experiments are performed on a machine with two NVIDIA A6000 GPUs.
Software Dependencies	No	The paper does not explicitly mention specific software dependencies (e.g., programming languages, libraries, or frameworks) with version numbers used for implementing the experiments.
Experiment Setup	Yes	Initial codebook sizes are set to 32 for MNIST and 64 for the remaining datasets, with an equivalent number of codewords added in each subsequent iteration. While additional iterations may converge within 5 rounds, we extend this to 20 for enhanced experimental clarity. The uncertainty evaluation is conducted 20 times using a dropout rate of 0.1, with thresholds γ set at 0.3 for MNIST, 0.1 for FMNIST, GTSRB, and CIFAR100, and 0.2 for CIFAR10, to fine-tune performance.