reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Can Private Machine Learning Be Fair?

Authors: Joseph Rance, Filip Svoboda

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that current SOTA methods for privately and fairly training models are unreliable in many practical scenarios. Specifically, we (1) introduce a new type of adversarial attack that seeks to introduce unfairness into private model training, and (2) demonstrate that the use of methods for training on private data that are robust to adversarial attacks often leads to unfair models, regardless of the use of fairness-enhancing training methods. ... Experimental results. We test the fairness attack for the datasets described in table 2 ... All experiments were performed on 2 NVIDIA RTX 2080 GPUs. We record the change in fairness after the attack is introduced for each dataset-defence combination. The attack is effective at introducing unfairness into all three tasks.
Researcher Affiliation	Academia	Joseph Rance, Filip Svoboda Department of Computer Science & Technology University of Cambridge Cambridge, United Kingdom EMAIL
Pseudocode	No	The paper describes algorithms and attacks in prose and mathematical notation (e.g., Theorem 1, equations for Fed Avg), but it does not present any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/Joseph-Rance/unfair-fl
Open Datasets	Yes	Experimental results. We test the fairness attack for the datasets described in table 2 (Becker and Kohavi 1996; Krizhevsky 2009; Pushshift 2017). These datasets were selected to cover a range of tasks and to provide clear comparison with previous work (Bagdasaryan et al. 2019; Bhagoji et al. 2019; Wang et al. 2020; Nguyen et al. 2023; Mc Mahan et al. 2023). ... Table 2: UCI Census, CIFAR-10, Reddit
Dataset Splits	No	The paper discusses data distribution among clients (e.g., 'i.i.d. data', 'log-normal label distribution across the clients', splitting inputs for groups), but it does not provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or references to standard splits for these datasets) that are generally used for model evaluation.
Hardware Specification	Yes	All experiments were performed on 2 NVIDIA RTX 2080 GPUs.
Software Dependencies	No	The paper mentions models like ResNet-50 and LSTM and tokenizers like albert-base-v2, but it does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Table 2: We train clients for 10, 2, and 5 epochs on i.i.d. data, for a total of 40, 120, and 100 rounds for the Census, CIFAR, and Reddit datasets respectively. ... We select hyperparameters by performing a grid search over all reasonable combinations at multiple levels of granularity and present the median result across three trials in table 1.