Ensuring Fairness Beyond the Training Data

Authors: Debmalya Mandal, Samuel Deng, Suman Jana, Jeannette Wing, Daniel J. Hsu

NeurIPS 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on standard machine learning fairness datasets suggest that, compared to the state-of-the-art fair classifiers, our classifier retains fairness guarantees and test accuracy for a large class of perturbations on the test set. Furthermore, our experiments show that there is an inherent trade-off between fairness robustness and accuracy of such classifiers.
Researcher Affiliation Academia Debmalya Mandal EMAIL Columbia University Samuel Deng EMAIL Columbia University Suman Jana EMAIL Columbia University Jeannette M. Wing EMAIL Columbia University Daniel Hsu EMAIL Columbia University
Pseudocode Yes ALGORITHM 1: Meta-Algorithm, ALGORITHM 2: Best Response of the -player, ALGORITHM 3: Approximate Fair Classi er (Apx Fair)
Open Source Code Yes Our code is available at this Git Hub repo: https://github.com/essdeee/ Ensuring-Fairness-Beyond-the-Training-Data.
Open Datasets Yes We used the following four datasets for our experiments. Adult. In this dataset [24], each example represents an adult individual... Communities and Crime. In this dataset from the UCI repository [29]... Law School. We used a preprocessed and balanced subset with 1,823 examples and 17 features [33]. COMPAS. We used a 2,000 example sample from the full dataset. For Adult, Communities and Crime, and Law School we used the preprocessed versions found in the accompanying Git Hub repo of [22]4. For COMPAS, we used a sample from the original dataset [1].
Dataset Splits Yes In order to evaluate different fair classi ers, we  rst split each dataset into  ve different random 80%-20% train-test splits. Then, we split each training set further into a 80%-20% train and validation sets. Therefore, there were  ve random sets of 64%-16%-20% train-validation-test split.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, cloud instances) used for running experiments were mentioned in the paper.
Software Dependencies No The paper mentions using 'scikit-learn’s logistic regression [27]' but does not provide specific version numbers for scikit-learn or other software dependencies.
Experiment Setup Yes To nd the correct hyper-parameters (B, , T, and Tm) for our algorithm, we  xed T = 10 for EO, and T = 5 for DP, and used grid search for the hyper-parameters B, , and Tm. The tested values were {0.1, 0, 2, . . . , 1} for B, {0, 0.05, . . . , 1} for , and {100, 200, . . . , 2000} for Tm.