reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fairlearn: Assessing and Improving Fairness of AI Systems

Authors: Hilde Weerts, Miroslav Dudík, Richard Edgar, Adrin Jalali, Roman Lutz, Michael Madaio

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	One of the key goals of the fairlearn library is to support fairness assessment. The goal of fairness assessment is to answer the question: Which groups of people may be disproportionately negatively impacted by an AI system and in what ways? In the context of allocation and quality-of-service harms, this means to evaluate how well the system performs for different population groups by calculating some performance metric, like an error rate, on diﬀerent slices of data. This is called disaggregated evaluation (Barocas et al., 2021).
Researcher Affiliation	Collaboration	The authors are the current maintainers of Fairlearn, and additionally have the following aﬃliations: 1Eindhoven University of Technology, 2Microsoft
Pseudocode	No	The paper describes various algorithms (e.g., Correlation Remover, Exponentiated Gradient, Adversarial Classifier, Threshold Optimizer) textually but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Both the library and the learning resources are licensed under MIT license and available online.2 2. https://github.com/fairlearn/fairlearn https://fairlearn.org
Open Datasets	Yes	The data sets provided in the module fairlearn.datasets also serve an educational role, as we use them to highlight sociotechnical aspects of fairness, with sections of the user guide highlighting fairness-related issues with several popular benchmark data sets.
Dataset Splits	No	The paper mentions disaggregated evaluation on 'different slices of data' but does not provide specific training/test/validation dataset splits (e.g., percentages, sample counts, or references to predefined splits) needed for reproduction.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running experiments.
Software Dependencies	No	The paper mentions popular libraries like scikit-learn, pandas, matplotlib, TensorFlow, and PyTorch with citations, but does not specify their version numbers in the text.
Experiment Setup	No	This paper describes the Fairlearn library and its functionalities; it does not present specific experiments conducted by the authors with their corresponding hyperparameters or training configurations within this document.