reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Interpretable algorithmic fairness in structured and unstructured data

Authors: Hari Bandi, Dimitris Bertsimas, Thodoris Koukouvinos, Sofie Kupiec

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present four case studies, demonstrating that our approach often outperforms state-of-the art methods in terms of fairness and meritocracy. In the case of unstructured data, we present two case studies on image classiﬁcation, demonstrating that our method outperforms state-of-the-art approaches in terms of fairness. Moreover, we note that the decrease in accuracy over the nominal model is 3.31% on structured data and 0.65% on unstructured data.
Researcher Affiliation	Academia	Hari Bandi EMAIL Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA 02139, USA; Dimitris Bertsimas EMAIL Sloan School of Management, Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA 02139, USA; Thodoris Koukouvinos EMAIL Operations Research Center, Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA 02139, USA; Sofie Kupiec EMAIL Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA 02139, USA
Pseudocode	Yes	Algorithm 1 Fair training of a classiﬁer Input: Training data (X, y), sensitive attribute S, initial guesses θ0, z0, parameters T, α, β, ϵ. Output: Optimal model parameters θK. 1: Compute τ1, τ2 from (3). 2: Initialize θ1, z1 = θ0, z0. 3: Initialize k = 1. 4: for t = 1 : T do 5: for c Ct do 6: θk+1 θk α cf(θk). 7: zt+1 c zt c β f(zt c). 9: zt+1 = Proj Zτ1,τ2( zt+1). 10: end for 11: Return θK.
Open Source Code	Yes	A code implementation of our method, including structured and unstructured data examples can be found in the following github repository: https://github.com/Th Koukouv/Fair_ Classification.
Open Datasets	Yes	We present case studies on four real-world datasets, the Law School Admission Council (LSAC) dataset, the Crime dataset, the Correctional Oﬀender Management Proﬁling for Alternative Sanctions (COMPAS) dataset and the German Credit dataset. We present case studies on two widely used datasets for image classiﬁcation, Celeb Faces Attributes (Celeb A) and Labeled Faces in the Wild (LFW).
Dataset Splits	Yes	We utilize the train/validation/test splits from the Python package ethic ML (2022) (Ethic ML) for structured data and those from Pytorch for unstructured data. All results are averaged over 10 diﬀerent random seeds. The LSAC dataset: We utilize 14, 569 observations for training, 1, 868 for validation and 4, 361 for testing. The Communities and Crime dataset: We utilize 1, 594 observations for training, 119 for validation and 280 for testing. The COMPAS dataset: We utilize 4, 933 observations for training, 370 for validation and 864 for testing. The German credit dataset: We utilize 800 observations for training, 60 for validation and 140 for testing. The Celeb A dataset: We utilize 150k images for training, 10k images for validation and 20k images for testing. The LFW dataset: We utilize 9, 5k images for training, 1k images for validation and 2, 5k images for testing.
Hardware Specification	No	All experiments are conducted in Python using Pytorch Paszke et al. (2017) for model training and Gurobi Optimization, Inc. (2017) (Gurobi) for solving Problem (5). No specific hardware details (like GPU/CPU models or memory) are provided.
Software Dependencies	No	All experiments are conducted in Python using Pytorch Paszke et al. (2017) for model training and Gurobi Optimization, Inc. (2017) (Gurobi) for solving Problem (5). The specific version numbers for Python and Pytorch are not mentioned, and while Gurobi has a year (2017) associated with its citation, it is not an explicit version number for the software itself.
Experiment Setup	Yes	We tune the learning rate and the number of epochs for both our approach, the nominal model and the benchmarks (if applicable) on a validation set based on the SPD metric. Throughout we ﬁx ϵ = 1e-2 for our method. We initialize Algorithm 1 with a random feasible solution z0 obtained as z0 = Proj Zτ1,τ2(0). For tabular data classiﬁcation, we use LR as the base architecture and for image classiﬁcation we use the Res Net-18 He et al. (2016) as the base architecture. Throughout we use the the soft margin loss, along with the Adam optimizer Kingma and Ba (2015) and a batch size of 64. For our approach, we try the values α, β {1e-3, 1e-2, 1e-1} and T {20, 50, 100}. When applying Algorithm 1, we try the values α, β {1e-3, 1e-2, 1e-1} and T {20, 50}.