A Flexible Generative Model for Heterogeneous Tabular EHR with Missing Modality

Authors: Huan He, William hao, Yuanzhe Xi, Yong Chen, Bradley Malin, Joyce Ho

ICLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that our model consistently outperforms existing state-of-the-art synthetic EHR generation methods both in fidelity by up to 3.10% and utility by up to 7.16%. Additionally, we show that our method can be successfully used in privacy-sensitive settings, where the original patient-level data cannot be shared.
Researcher Affiliation Academia Huan He Department of Biostatistics University of Pennsylvania EMAIL William Hao Department of Computer Science Emory University EMAIL Yuanzhe Xi Department of Mathematics Emory University EMAIL Yong Chen Department of Biostatistics University of Pennsylvania EMAIL Bradley Malin Department of Biomedical Informatics Vanderbilt University EMAIL Joyce C Ho Department of Biostatistics Emory University EMAIL
Pseudocode Yes A.4 ALGORITHM OF FLEXGEN-EHR Algorithm 1: Training of FLEXGEN-EHR
Open Source Code No The paper states that codes for *baseline models* are available online (with links provided), but does not provide an explicit statement or link for the source code of FLEXGEN-EHR itself.
Open Datasets Yes We use two real-world de-identified EHR datasets, MIMIC-III (Johnson et al., 2016) and e ICU (Pollard et al., 2018).
Dataset Splits No The paper does not provide specific percentages or methodology for train/validation/test splits, nor does it explicitly mention a validation set. It mentions using 'test datasets' but not the splitting strategy.
Hardware Specification Yes For training the models, we used Adam (Kingma & Ba, 2015) with the learning rate set to 0.001, and a mini-batch of 128 on a machine equipped with one Nvidia Ge Force RTX 3090 and CUDA 11.2.
Software Dependencies Yes We implemented FLEXGEN-EHR with Py Torch. For training the models, we used Adam (Kingma & Ba, 2015) with the learning rate set to 0.001, and a mini-batch of 128 on a machine equipped with one Nvidia Ge Force RTX 3090 and CUDA 11.2.
Experiment Setup Yes For training the models, we used Adam (Kingma & Ba, 2015) with the learning rate set to 0.001, and a mini-batch of 128... Hyperparamters of FLEXGEN-EHR are selected after grid search. We use a timestep of 50 and a noise scheduling β from 1 10 4 to 1 10 2.