reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimized Score Transformation for Consistent Fair Classification

Authors: Dennis Wei, Karthikeyan Natesan Ramamurthy, Flavio P. Calmon

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments comparing to 10 existing methods show that Fair Score Transformer has advantages for score-based metrics such as Brier score and AUC while remaining competitive for binary label-based metrics such as accuracy. We have conducted comprehensive experiments, reported in Section 6 and Appendix C, comparing FST to 10 existing methods, a number that compares favorably to recent metastudies (Friedler et al., 2019).
Researcher Affiliation	Collaboration	Dennis Wei EMAIL Karthikeyan Natesan Ramamurthy EMAIL IBM Research 1101 Kitchawan Road Yorktown Heights, NY 10598, USA Flavio P. Calmon EMAIL John A. Paulson School of Engineering and Applied Sciences Harvard University 150 Western Ave Allston, MA 02134, USA
Pseudocode	Yes	Under the ﬁrst decomposition, application of the scaled ADMM algorithm (Boyd et al., 2011, Section 3.1.1) to (19) yields the following three steps in each iteration k = 0, 1, . . . : µ(k+1)(xi) = arg min µ 1 ng µ; ˆr(xi) + ρ µ (λ(k))Tˆf(xi) + c(k)(xi) 2 i = 1, . . . , n λ(k+1) = arg min λ ϵ λ 1 + ρ µ(k+1)(xi) λTˆf(xi) + c(k)(xi) 2 (20b) c(k+1)(xi) = c(k)(xi) + µ(k+1)(xi) λ(k+1) T ˆf(xi) i = 1, . . . , n. (20c)
Open Source Code	No	The paper discusses third-party code used for comparison (e.g., reductions, FERM) and provides links to those, but does not explicitly state that the authors' own source code for the methodology described in this paper (Fair Score Transformer) is available. There is no link provided for their implementation.
Open Datasets	Yes	Four data sets were used, the ﬁrst three of which are standard in the fairness literature: 1) Adult Income, 2) Pro Publica s COMPAS recidivism, 3) German credit risk, 4) Medical Expenditure Panel Survey (MEPS). Speciﬁcally, we used versions pre-processed by an opensource library for algorithmic fairness (Bellamy et al., 2018).
Dataset Splits	Yes	Each data set was randomly split 10 times into training (75%) and test (25%) sets and all methods were subject to the same splits.
Hardware Specification	Yes	Experiments were performed on a machine running Ubuntu OS with 32 cores, and 64 GB RAM.
Software Dependencies	No	The paper mentions using "scikit-learn (Pedregosa et al., 2011)" for base classifiers and references "http://cvxopt.org/" for a generic convex optimization solver. However, it does not provide specific version numbers for these software components, which is required for a reproducible description.
Experiment Setup	Yes	5-fold cross-validation to select parameters for LR (regularization parameter C from [10 4, 104]) and GBM (minimum number of samples per leaf from {5, 10, 15, 20, 30}) was done only once per training set. All other parameters were set to the scikit-learn defaults.