Fairness guarantees in multi-class classification with demographic parity
Authors: Christophe Denis, Romuald Elie, Mohamed Hebiri, François Hu
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The approach is evaluated on both synthetic and real datasets and reveals very effective in decision making with a preset level of unfairness. In addition, our method is competitive (if not better) with the state-of-the-art in binary and multi-class tasks. Keywords: algorithmic fairness, demographic parity, multi-class classifiction |
| Researcher Affiliation | Collaboration | Christophe Denis EMAIL LPSM, UMR-CNRS 8001, Sorbonne Universit e 4, Place Jussieu, 75005 Paris, France Romuald Elie EMAIL LAMA, UMR-CNRS 8050, Universit e Gustave Eiffel 5 Bd Descartes, 77454 Marne-la-Vall ee cedex 2, France Mohamed Hebiri EMAIL LAMA, UMR-CNRS 8050, Universit e Gustave Eiffel 5 Bd Descartes, 77454 Marne-la-Vall ee cedex 2, France Fran cois Hu EMAIL R&D Department, AI Lab, Milliman France 14 Av. de la Grande Arm ee, 75017 Paris, France |
| Pseudocode | Yes | The implementation pseudo-code is provided in Algorithm 1. Algorithm 1 ε-fairness calibration Input: Approximate fairness parameter ε, new data point (x, s), base estimators ( pk)k, unlabeled sample D N, (ζk)k and i.i.d uniform perturbations (ζs k,i)k,i,s in [0, 10 5]. Step 0. Split D N and construct the samples (S1, . . . , SN) and {Xs 1, . . . , Xs Ns}, for s S; Step 1. Compute the empirical frequencies (ˆπs)s based on (S1, . . . , SN); Step 2. Compute ˆλ(1) = (ˆλ(1) 1 , . . . , ˆλ(1) K ) and ˆλ(2) = (ˆλ(2) 1 , . . . , ˆλ(2) K ) as a solution of Eq. (1); Sequential quadratic programming of Section 4.1 can be used for this step. Step 3. Compute ˆg thanks to Eq. (2); Output: ε-fair classification ˆg(x, s) at point (x, s). |
| Open Source Code | Yes | The source of our method can be found at https://github.com/curious ML/epsilon-fairness. |
| Open Datasets | Yes | Drug Consumption (DRUG) This dataset Fehrman et al. (2017) contains demographic information such as age, gender, and education level, as well as measures of personality traits thought to influence drug use for 1885 respondents. Communities&Crime (CRIME) This dataset contains socio-economic, law enforcement, and crime data about communities in the US with 1994 examples. Following Calders et al. (2013), the sensitive feature is a binary variable that corresponds to the ethnicity. |
| Dataset Splits | Yes | We generate n = 5000 synthetic examples and split the data into three sets (60% training, 20% hold-out and 20% unlabeled). |
| Hardware Specification | Yes | All algorithms were executed on an Apple M1 Pro processor to maintain consistency in the experimental setup. |
| Software Dependencies | No | The paper mentions "Random Forest (RF) with default parameters in scikit-learn" and "reglog, we use the default parameters in scikit-learn." without specifying a version number for scikit-learn or any other library. |
| Experiment Setup | Yes | For RF, we set the number of trees in {10, 11, . . . , 200}, the maximum depth of each tree in {2, 3, . . . , 16}, the minimum number of samples required to split an internal node in {2, 3, . . . , 10}, and the minimum number of samples required to be at a leaf node in {1, . . . , 8}; For GBM, we set L1 and L2 regularization terms on weights both in {0, 0.1, 1, 2, 5, 10, 20, 50}, the number of boosted trees in {10, 11, . . . , 200}, the maximum tree leaves in {6, 7, . . . , 50}, the maximum depth of each tree in {2, 3, . . . , 16}, and the minimum number of samples required in a child node for a split to occur in the tree in {10, 11, . . . , 100}. |