reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Certifiably Robust Model Evaluation in Federated Learning under Meta-Distributional Shifts

Authors: Amir Najafi, Samin Mahdizadeh Sani, Farzan Farnia

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our work is mainly theoretical; In any case, we present a series of experiments on real-world datasets to show tightness and computability of our bounds in practice. First, we outline our client generation model and present a number of non-robust risk CDF guarantees. A more complete set of experiments with complementary explanations can be found in Appendix G. We simulated a federated learning scenario with n = 1000 nodes, where each node contains 1000 local samples. The experiments were conducted using four different datasets: CIFAR-10 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), EMNIST (Cohen et al., 2017), and Image Net (Russakovsky et al., 2015).
Researcher Affiliation	Academia	1Department of Computer Engineering, Sharif University of Technology, Tehran, Iran (Corresponding author) 2Department of Electrical and Computer Engineering, University of Tehran, Tehran, Iran 3Department of Computer Science and Engineering, The Chinese University of Hong Kong (CUHK), Hong Kong. Correspondence to: Amir Najafi <EMAIL>, Samin Mahdizadeh Sani <EMAIL>, Farzan Farnia <EMAIL>.
Pseudocode	Yes	Algorithm 1 Server-side Bisection Algorithm Require: K, ε, δ input h, > 0, and poly K, log 1 query budget for c QVk, for all k [K] 1: Initialize a min ℓ( ) (or 0), b max ℓ( ) (or 1) 2: while b a > do 3: t (a + b)/2 4: Solve convex feasibility problem: 5: Find ρ1, . . . , ρK ε/K such that k [K] ρk ε 1 + 1 δ ) K 7: 1 K P k [K] c QVk(h, ρk) t 8: if problem is feasible then 9: a t 10: else 11: b t 12: end if 13: end while output upper-bound b
Open Source Code	Yes	The project code is available at: github.com/samin-mehdizadeh/ Robust-Evaluation-DKW
Open Datasets	Yes	The experiments were conducted using four different datasets: CIFAR-10 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), EMNIST (Cohen et al., 2017), and Image Net (Russakovsky et al., 2015).
Dataset Splits	Yes	We simulated a federated learning scenario with n = 1000 nodes, where each node contains 1000 local samples. The experiments were conducted using four different datasets: CIFAR-10 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), EMNIST (Cohen et al., 2017), and Image Net (Russakovsky et al., 2015). ... Figure 4 illustrates our bounds on the risk CDF of unseen clients with no shifts. We selected 100 nodes from the population and considered 400 other nodes as unseen clients.
Hardware Specification	No	No specific hardware details are provided in the paper.
Software Dependencies	No	No specific software dependencies with version numbers are listed in the paper.
Experiment Setup	Yes	Feature Distribution Shift: ... The standard deviation varies based on the dataset: 0.05 for CIFAR-10 and SVHN, 0.1 for EMNIST, and 0.01 for Image Net. ... Label Distribution Shift: ... In our experiments, we use α = 0.4. ... Resolutions: ... The Dirichlet α coefficients for the first (source) meta-distribution range from 0.4 to 0.7 for the four lower resolutions and from 0.7 to 1 for the four higher resolutions. For the second (target) meta-distribution, the ranges are reversed: 0.7 to 1 for the lower resolutions and 0.4 to 0.7 for the higher resolutions. ... Colors: The color intensity of the images varies from 0.00 (gray-scale) to 1.00 (fully colored). For the source meta-distribution, the α coefficients range from 0 to 0.5 for images with color intensity below 0.5, and from 0.5 to 1 for images above 0.5.