Bt-GAN: Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks
Authors: Resmi Ramachandranpillai, Md Fahim Sikder, David Bergström, Fredrik Heintz
JAIR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the effectiveness of our proposed method, we conduct extensive experiments using the Medical Information Mart for Intensive Care (MIMIC-III) database. Our results demonstrate that Bt-GAN achieves state-of-the-art accuracy while significantly improving fairness and minimizing bias amplification. Furthermore, we perform an in-depth explainability analysis to provide additional evidence supporting the validity of our study. |
| Researcher Affiliation | Academia | Resmi Ramachandranpillai EMAIL Md Fahim Sikder EMAIL David Bergstr om EMAIL Fredrik Heintz EMAIL Department of Computer and Information Science (IDA), Link oping University, Sweden |
| Pseudocode | No | The paper describes the methodology using prose, mathematical equations, and architectural diagrams (e.g., Figure 2: Architecture of Bt-GAN), but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an unambiguous statement or a direct link to a source-code repository for the methodology described. |
| Open Datasets | Yes | Dataset: We use MIMIC-III, a publicly available healthcare database containing deidentified patient admissions between 2001 and 2012 (33798 unique patient stays). We obtained permission to access MIMIC-III for research purposes after completion of an online course (certification number 45456719). UCI Adult Dataset This dataset is based on US census data (1994) and contains 48,842 rows with attributes such as age, sex, occupation, and education level, and the target variable indicates whether an individual has an income that exceeds 50K per year or not. In our experiments, we consider the protected attribute to be sex (S = Sex , Y = Income ). Pro Publica Dataset from COMPAS Risk Assessment System This dataset contains information about defendants from Broward County and contains attributes about defendants such as their ethnicity, language, marital status, sex, etc., and for each individual a score showing the likelihood of recidivism (reoffending). |
| Dataset Splits | Yes | Dataset Preparation: We extracted records from tables, PATIENTS, ADMISSIONS, ICU STAYS, CHARTEVENTS, LABEVENTS, and OUTPUTEVENTS. Records are then validated using HADM ID and ICUSTAY ID, resulting in a total of 33,798 patients with 42276 ICU stays. Among them, we split 28728 patients, 35948 ICU stays for training, and the remaining for testing. |
| Hardware Specification | Yes | We carried out the experiments using Py Torch in Intel Core i-9, 11th generation, with 128 GB RAM and GPU-2* NVIDIA RTX 2080TI (11 GB). |
| Software Dependencies | No | The paper mentions using 'Py Torch' and 'Adam optimizer' but does not specify their version numbers. For example, 'We carried out the experiments using Py Torch in Intel Core i-9, 11th generation, with 128 GB RAM and GPU-2* NVIDIA RTX 2080TI (11 GB)' and 'The learning rate is set to .0001 with Adam optimizer.'. |
| Experiment Setup | Yes | Implementation Details: The training epochs use a mini-batch size of 1024. The learning rate is set to .0001 with Adam optimizer. We carried out the experiments using Py Torch in Intel Core i-9, 11th generation, with 128 GB RAM and GPU-2* NVIDIA RTX 2080TI (11 GB). When α = 0.5, the model is performing comparatively better on the mortality prediction task. Also, when α = 0.0, the MI reduction part in equation 8 is inactive and thus is the fairness constraint. According to this, we set α = 0.5 to balance the quality-fairness trade-off for the entire process. |