Fairness Overfitting in Machine Learning: An Information-Theoretic Perspective
Authors: Firas Laakom, Haobo Chen, Jürgen Schmidhuber, Yuheng Bu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results validate the tightness and practical relevance of these bounds across diverse fairness-aware learning algorithms. Our framework offers valuable insights to guide the design of algorithms improving fairness generalization. ... We conduct extensive empirical experiments on standard fairness datasets, including COMPAS and Adult. These experiments highlight the tightness of our bounds and their ability to capture the complex behavior of the fairness generalization error, providing valuable insights for future algorithm design. |
| Researcher Affiliation | Academia | 1Center of Excellence for Generative AI, KAUST, Saudi Arabia 2University of Florida, Gainesville, USA 3The Swiss AI Lab, IDSIA, USI & SUPSI, Switzerland. Correspondence to: Firas Laakom <EMAIL>, Yuheng Bu <EMAIL>. |
| Pseudocode | No | The paper describes theoretical frameworks and empirical evaluations but does not contain any explicit sections or figures labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing code or a link to a code repository. |
| Open Datasets | Yes | We conduct experiments on the COMPAS dataset (Larson et al., 2016). ... We conduct extensive empirical experiments on standard fairness datasets, including COMPAS and Adult. ... The COMPAS (Larson et al., 2016) dataset, which involves recidivism prediction based on criminal and demographic records. ... The Adult (Kohavi & Becker, 1996) dataset, derived from U.S. Census data, which focuses on income prediction. |
| Dataset Splits | No | The paper mentions varying the 'number of training samples' and describes a super-sample framework for evaluating bounds ('we draw m2 = 50 different train/test splits' for evaluation of CMI terms), but it does not specify conventional train/validation/test splits (e.g., percentages or fixed counts) for the initial training of the fairness algorithms themselves from the COMPAS or Adult datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions methods for CMI estimation, such as 'Ross (2014)', but does not list specific software libraries with version numbers (e.g., Python 3.8, PyTorch 1.9) that were used in the experiments. |
| Experiment Setup | No | The paper states: 'All approaches follow the same training protocol (architectures, hyperparameters, etc.) as in Han et al. (2024).' This defers specific experimental setup details like hyperparameters and architectures to another paper, rather than providing them directly in the main text. |