Fairness Overfitting in Machine Learning: An Information-Theoretic Perspective

Authors: Firas Laakom, Haobo Chen, Jürgen Schmidhuber, Yuheng Bu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results validate the tightness and practical relevance of these bounds across diverse fairness-aware learning algorithms. Our framework offers valuable insights to guide the design of algorithms improving fairness generalization. ... We conduct extensive empirical experiments on standard fairness datasets, including COMPAS and Adult. These experiments highlight the tightness of our bounds and their ability to capture the complex behavior of the fairness generalization error, providing valuable insights for future algorithm design.
Researcher Affiliation Academia 1Center of Excellence for Generative AI, KAUST, Saudi Arabia 2University of Florida, Gainesville, USA 3The Swiss AI Lab, IDSIA, USI & SUPSI, Switzerland. Correspondence to: Firas Laakom <EMAIL>, Yuheng Bu <EMAIL>.
Pseudocode No The paper describes theoretical frameworks and empirical evaluations but does not contain any explicit sections or figures labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code No The paper does not contain any explicit statement about releasing code or a link to a code repository.
Open Datasets Yes We conduct experiments on the COMPAS dataset (Larson et al., 2016). ... We conduct extensive empirical experiments on standard fairness datasets, including COMPAS and Adult. ... The COMPAS (Larson et al., 2016) dataset, which involves recidivism prediction based on criminal and demographic records. ... The Adult (Kohavi & Becker, 1996) dataset, derived from U.S. Census data, which focuses on income prediction.
Dataset Splits No The paper mentions varying the 'number of training samples' and describes a super-sample framework for evaluating bounds ('we draw m2 = 50 different train/test splits' for evaluation of CMI terms), but it does not specify conventional train/validation/test splits (e.g., percentages or fixed counts) for the initial training of the fairness algorithms themselves from the COMPAS or Adult datasets.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions methods for CMI estimation, such as 'Ross (2014)', but does not list specific software libraries with version numbers (e.g., Python 3.8, PyTorch 1.9) that were used in the experiments.
Experiment Setup No The paper states: 'All approaches follow the same training protocol (architectures, hyperparameters, etc.) as in Han et al. (2024).' This defers specific experimental setup details like hyperparameters and architectures to another paper, rather than providing them directly in the main text.