Interpretable algorithmic fairness in structured and unstructured data

Authors: Hari Bandi, Dimitris Bertsimas, Thodoris Koukouvinos, Sofie Kupiec

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present four case studies, demonstrating that our approach often outperforms state-of-the art methods in terms of fairness and meritocracy. In the case of unstructured data, we present two case studies on image classification, demonstrating that our method outperforms state-of-the-art approaches in terms of fairness. Moreover, we note that the decrease in accuracy over the nominal model is 3.31% on structured data and 0.65% on unstructured data.
Researcher Affiliation Academia Hari Bandi EMAIL Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA 02139, USA; Dimitris Bertsimas EMAIL Sloan School of Management, Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA 02139, USA; Thodoris Koukouvinos EMAIL Operations Research Center, Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA 02139, USA; Sofie Kupiec EMAIL Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA 02139, USA
Pseudocode Yes Algorithm 1 Fair training of a classifier Input: Training data (X, y), sensitive attribute S, initial guesses θ0, z0, parameters T, α, β, ϵ. Output: Optimal model parameters θK. 1: Compute τ1, τ2 from (3). 2: Initialize θ1, z1 = θ0, z0. 3: Initialize k = 1. 4: for t = 1 : T do 5: for c Ct do 6: θk+1 θk α cf(θk). 7: zt+1 c zt c β f(zt c). 9: zt+1 = Proj Zτ1,τ2( zt+1). 10: end for 11: Return θK.
Open Source Code Yes A code implementation of our method, including structured and unstructured data examples can be found in the following github repository: https://github.com/Th Koukouv/Fair_ Classification.
Open Datasets Yes We present case studies on four real-world datasets, the Law School Admission Council (LSAC) dataset, the Crime dataset, the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) dataset and the German Credit dataset. We present case studies on two widely used datasets for image classification, Celeb Faces Attributes (Celeb A) and Labeled Faces in the Wild (LFW).
Dataset Splits Yes We utilize the train/validation/test splits from the Python package ethic ML (2022) (Ethic ML) for structured data and those from Pytorch for unstructured data. All results are averaged over 10 different random seeds. The LSAC dataset: We utilize 14, 569 observations for training, 1, 868 for validation and 4, 361 for testing. The Communities and Crime dataset: We utilize 1, 594 observations for training, 119 for validation and 280 for testing. The COMPAS dataset: We utilize 4, 933 observations for training, 370 for validation and 864 for testing. The German credit dataset: We utilize 800 observations for training, 60 for validation and 140 for testing. The Celeb A dataset: We utilize 150k images for training, 10k images for validation and 20k images for testing. The LFW dataset: We utilize 9, 5k images for training, 1k images for validation and 2, 5k images for testing.
Hardware Specification No All experiments are conducted in Python using Pytorch Paszke et al. (2017) for model training and Gurobi Optimization, Inc. (2017) (Gurobi) for solving Problem (5). No specific hardware details (like GPU/CPU models or memory) are provided.
Software Dependencies No All experiments are conducted in Python using Pytorch Paszke et al. (2017) for model training and Gurobi Optimization, Inc. (2017) (Gurobi) for solving Problem (5). The specific version numbers for Python and Pytorch are not mentioned, and while Gurobi has a year (2017) associated with its citation, it is not an explicit version number for the software itself.
Experiment Setup Yes We tune the learning rate and the number of epochs for both our approach, the nominal model and the benchmarks (if applicable) on a validation set based on the SPD metric. Throughout we fix ϵ = 1e-2 for our method. We initialize Algorithm 1 with a random feasible solution z0 obtained as z0 = Proj Zτ1,τ2(0). For tabular data classification, we use LR as the base architecture and for image classification we use the Res Net-18 He et al. (2016) as the base architecture. Throughout we use the the soft margin loss, along with the Adam optimizer Kingma and Ba (2015) and a batch size of 64. For our approach, we try the values α, β {1e-3, 1e-2, 1e-1} and T {20, 50, 100}. When applying Algorithm 1, we try the values α, β {1e-3, 1e-2, 1e-1} and T {20, 50}.