FairPFN: A Tabular Foundation Model for Causal Fairness
Authors: Jake Robertson, Noah Hollmann, Samuel Müller, Noor Awad, Frank Hutter
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section assesses Fair PFN s performance on synthetic and real-world benchmarks, highlighting its capability to remove the causal influence of protected attributes without user-specified knowledge of the causal model, while maintaining high predictive accuracy. We first evaluate Fair PFN using synthetic causal case studies to establish an experimental setting where the data-generating processes and all causal quantities are known, presenting a series of causal case studies with increasing difficulty to evaluate Fair PFN s capacity to remove various sources of bias in causally generated data. |
| Researcher Affiliation | Collaboration | 1ELLIS Institute T ubingen 2University of Freiburg 3Charit e University Medicine Berlin 4Prior Labs 5Meta. |
| Pseudocode | Yes | We provide pseudocode for our pre-training algorithm in Algorithm 2, and outline the steps below. |
| Open Source Code | Yes | We provide a prediction interface to evaluate and assess our pre-trained model, as well as code to generate and visualize our pre-training data at https://github.com/jr2021/Fair PFN. |
| Open Datasets | Yes | The first dataset is the Law School Admissions dataset from the 1998 LSAC National Longitudinal Bar Passage Study (Wightman, 1998), which includes admissions data fr approximately 30,000 US law school applicants, revealing disparities in bar passage rates and first-year averages by ethnicity. The second dataset, derived from the 1994 US Census, is the Adult Census Income problem (Dua & Graff, 2017), containing demographic and income outcome data (INC 50K) for nearly 50,000 individuals |
| Dataset Splits | Yes | After generating Dbias and Dfair, we partition them into training and validation sets: Dtrain bias , Dval bias, Dtrain fair , and Dval fair. Figure 6 shows the mean prediction average treatment effect (ATE) and predictive error (1-AUC) across 5 K-fold cross-validation iterations. |
| Hardware Specification | Yes | The transformer is trained for approximately 3 days on an RTX-2080 GPU on approximately 1.5 million different synthetic data-generating mechanisms, in which we vary the MLP architecture, the number of features m, the sample size n, and the non-linearities z. |
| Software Dependencies | No | The paper mentions 'XGBoost (Chen & Guestrin, 2016)' as a base model for EGR, but does not specify a version number for it or any other software dependencies like Python, PyTorch, or specific libraries used for the Fair PFN implementation. |
| Experiment Setup | Yes | The transformer is trained for approximately 3 days on an RTX-2080 GPU on approximately 1.5 million different synthetic data-generating mechanisms, in which we vary the MLP architecture, the number of features m, the sample size n, and the non-linearities z. For a robust evaluation, we generate 100 datasets per case study, varying causal weights of protected attributes w A, sample sizes m (100, 10000) (sampled on a log-scale), and the standard deviation σ (0, 1) (log-scale) of additive noise terms. |