reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Matrix Reloaded: Towards Counterfactual Group Fairness in Machine Learning

Authors: Mariana Pinto, Andre V Carreiro, Pedro Madeira, Alberto Lopez, Hugo Gamboa

DMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate their utility and complementarity with standard group fairness metrics through experiments on real-world datasets. Our results show that domain knowledge is key, and that our metrics can reveal subtle biases that traditional bias evaluation strategies may overlook, providing a more nuanced understanding of potential model bias.
Researcher Affiliation	Collaboration	Associacao Fraunhofer Portugal Research AICOS, Porto, Portugal; INCMLab, Imprensa Nacional Casa da Moeda, Lisbon, Portugal Mathematics Department and CEMAPRE, ISEG, University of Lisbon, Portugal; Laboratory for Instrumentation, Biomedical Eng. and Radiation Physics (LIBPhys-UNL), NOVA School of Science and Technology, Caparica, Portugal
Pseudocode	Yes	Appendix A. Pseudo-Code of the Counterfactual Generation Process Algorithm 1 Counterfactual Generation
Open Source Code	No	The text does not explicitly state that the authors provide open-source code for the methodology described in this paper, nor does it provide a link to such a repository. It only refers to code provided by a third party (Yeom (2020)) which was used for comparison.
Open Datasets	Yes	Heart Disease is a public dataset from the UCI Machine Learning Repository (Asuncion and Newman (2007)).; Adult (census income) dataset. https://archive.ics.uci.edu/ml/datasets/Adult, 1996. UCI Machine Learning Repository.; COMPAS recidivism risk score data and analysis. https://www.propublica.org/datastore/dataset/compas-recidivism-risk-score-data-and-analysis, 2017. Pro Publica Data Store.
Dataset Splits	Yes	To ensure robust metrics and fair evaluation across different model architectures, we employed a cross-validation approach. Each dataset was divided into five train-test folds, with each fold used for testing exactly once, resulting in five distinct models trained on the remaining folds.
Hardware Specification	Yes	1. Machine Specs Processor : 12th Gen Intel(R) Core(TM) i7-1255U 1.70 GHz RAM : 16,0 GB
Software Dependencies	No	The paper references the use of Light GBM and Fair GBM algorithms, but does not provide specific version numbers for these libraries or any other key software components used in their implementation.
Experiment Setup	Yes	The proposed algorithm starts by handling continuous variables. The generation begins by computing the Cumulative Distribution Functions (CDFs)... If the difference surpasses a predefined threshold (τ = 0.5 by default), the feature value flips, remaining unchanged otherwise. The current model in practice was optimised for prompt positive outcome identification, using a minimal eight-feature set. It relies on a Random Forest classifier, trained with Threshold Optimisation with 5-fold cross-validation to maximise recall...