reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bt-GAN: Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks

Authors: Resmi Ramachandranpillai, Md Fahim Sikder, David Bergström, Fredrik Heintz

JAIR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the effectiveness of our proposed method, we conduct extensive experiments using the Medical Information Mart for Intensive Care (MIMIC-III) database. Our results demonstrate that Bt-GAN achieves state-of-the-art accuracy while significantly improving fairness and minimizing bias amplification. Furthermore, we perform an in-depth explainability analysis to provide additional evidence supporting the validity of our study.
Researcher Affiliation	Academia	Resmi Ramachandranpillai EMAIL Md Fahim Sikder EMAIL David Bergstr om EMAIL Fredrik Heintz EMAIL Department of Computer and Information Science (IDA), Link oping University, Sweden
Pseudocode	No	The paper describes the methodology using prose, mathematical equations, and architectural diagrams (e.g., Figure 2: Architecture of Bt-GAN), but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an unambiguous statement or a direct link to a source-code repository for the methodology described.
Open Datasets	Yes	Dataset: We use MIMIC-III, a publicly available healthcare database containing deidentified patient admissions between 2001 and 2012 (33798 unique patient stays). We obtained permission to access MIMIC-III for research purposes after completion of an online course (certification number 45456719). UCI Adult Dataset This dataset is based on US census data (1994) and contains 48,842 rows with attributes such as age, sex, occupation, and education level, and the target variable indicates whether an individual has an income that exceeds 50K per year or not. In our experiments, we consider the protected attribute to be sex (S = Sex , Y = Income ). Pro Publica Dataset from COMPAS Risk Assessment System This dataset contains information about defendants from Broward County and contains attributes about defendants such as their ethnicity, language, marital status, sex, etc., and for each individual a score showing the likelihood of recidivism (reoffending).
Dataset Splits	Yes	Dataset Preparation: We extracted records from tables, PATIENTS, ADMISSIONS, ICU STAYS, CHARTEVENTS, LABEVENTS, and OUTPUTEVENTS. Records are then validated using HADM ID and ICUSTAY ID, resulting in a total of 33,798 patients with 42276 ICU stays. Among them, we split 28728 patients, 35948 ICU stays for training, and the remaining for testing.
Hardware Specification	Yes	We carried out the experiments using Py Torch in Intel Core i-9, 11th generation, with 128 GB RAM and GPU-2* NVIDIA RTX 2080TI (11 GB).
Software Dependencies	No	The paper mentions using 'Py Torch' and 'Adam optimizer' but does not specify their version numbers. For example, 'We carried out the experiments using Py Torch in Intel Core i-9, 11th generation, with 128 GB RAM and GPU-2* NVIDIA RTX 2080TI (11 GB)' and 'The learning rate is set to .0001 with Adam optimizer.'.
Experiment Setup	Yes	Implementation Details: The training epochs use a mini-batch size of 1024. The learning rate is set to .0001 with Adam optimizer. We carried out the experiments using Py Torch in Intel Core i-9, 11th generation, with 128 GB RAM and GPU-2* NVIDIA RTX 2080TI (11 GB). When α = 0.5, the model is performing comparatively better on the mortality prediction task. Also, when α = 0.0, the MI reduction part in equation 8 is inactive and thus is the fairness constraint. According to this, we set α = 0.5 to balance the quality-fairness trade-off for the entire process.