reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Flexible Generative Model for Heterogeneous Tabular EHR with Missing Modality

Authors: Huan He, William hao, Yuanzhe Xi, Yong Chen, Bradley Malin, Joyce Ho

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that our model consistently outperforms existing state-of-the-art synthetic EHR generation methods both in ﬁdelity by up to 3.10% and utility by up to 7.16%. Additionally, we show that our method can be successfully used in privacy-sensitive settings, where the original patient-level data cannot be shared.
Researcher Affiliation	Academia	Huan He Department of Biostatistics University of Pennsylvania EMAIL William Hao Department of Computer Science Emory University EMAIL Yuanzhe Xi Department of Mathematics Emory University EMAIL Yong Chen Department of Biostatistics University of Pennsylvania EMAIL Bradley Malin Department of Biomedical Informatics Vanderbilt University EMAIL Joyce C Ho Department of Biostatistics Emory University EMAIL
Pseudocode	Yes	A.4 ALGORITHM OF FLEXGEN-EHR Algorithm 1: Training of FLEXGEN-EHR
Open Source Code	No	The paper states that codes for baseline models are available online (with links provided), but does not provide an explicit statement or link for the source code of FLEXGEN-EHR itself.
Open Datasets	Yes	We use two real-world de-identiﬁed EHR datasets, MIMIC-III (Johnson et al., 2016) and e ICU (Pollard et al., 2018).
Dataset Splits	No	The paper does not provide specific percentages or methodology for train/validation/test splits, nor does it explicitly mention a validation set. It mentions using 'test datasets' but not the splitting strategy.
Hardware Specification	Yes	For training the models, we used Adam (Kingma & Ba, 2015) with the learning rate set to 0.001, and a mini-batch of 128 on a machine equipped with one Nvidia Ge Force RTX 3090 and CUDA 11.2.
Software Dependencies	Yes	We implemented FLEXGEN-EHR with Py Torch. For training the models, we used Adam (Kingma & Ba, 2015) with the learning rate set to 0.001, and a mini-batch of 128 on a machine equipped with one Nvidia Ge Force RTX 3090 and CUDA 11.2.
Experiment Setup	Yes	For training the models, we used Adam (Kingma & Ba, 2015) with the learning rate set to 0.001, and a mini-batch of 128... Hyperparamters of FLEXGEN-EHR are selected after grid search. We use a timestep of 50 and a noise scheduling β from 1 10 4 to 1 10 2.