reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Guided Discrete Diffusion for Electronic Health Record Generation

Authors: Jun Han, Zixiang Chen, Yongqian Li, Yiwen Kou, Eran Halperin, Robert E. Tillman, Quanquan Gu

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that EHR-D3PM significantly outperforms existing generative baselines on comprehensive fidelity and utility metrics while maintaining less attribute and membership vulnerability risks. Furthermore, we show EHR-D3PM is effective as a data augmentation method and enhances performance on downstream tasks when combined with real data.
Researcher Affiliation	Collaboration	Jun Han* EMAIL Optum AI, UHG; Zixiang Chen*, Yongqian Li, Yiwen Kou EMAIL Department of Computer Science, UCLA
Pseudocode	No	The paper describes the model architecture and procedures using mathematical equations and descriptive text, but does not include a clearly labeled pseudocode block or algorithm.
Open Source Code	No	The paper mentions the open-source codebases for baseline models like EHRDiff and EHRMGAN, but it does not provide concrete access to the source code for the methodology described in this paper (EHR-D3PM).
Open Datasets	Yes	Public Datasets MIMIC-III (Johnson et al., 2016) includes deidentified patient EHRs from hospital stays.
Dataset Splits	Yes	MIMIC Dataset ... We have implemented an 80/20 split for training and testing purposes. Specifically, this allocates 12,862 records for testing and the remaining 51,451 for training. The first dataset, denoted by D1, includes a patient population of size 1,670,347. We split the whole dataset into 100K for validation, 2000K for testing and the rest 1,370, 347 for training. The second dataset, denoted by D2, includes a patient population of size 1,859,536. We split the whole dataset into 100K for validation, 2000K for testing and the rest 1,559,536 for training.
Hardware Specification	Yes	It takes less than three hours to finish training this model on A6000 with 48G memory. ... It takes one and half day to train one model on A100 with 80G memory.
Software Dependencies	No	The paper mentions using a 'light gradient boosting decision tree model (LGBM)' and 'adam W optimizer', but it does not provide specific version numbers for these or other key software components, which is required for a reproducible description of ancillary software.
Experiment Setup	Yes	The hidden dimension 256. The number of multi-attention heads is 8. The number of transformer layers is 5. The number of diffusion steps is 500. In the optimization phase, we adopt adam W optimizer, and the weight decay in adam W is 1.e-5. The learning rate is 1e-4 and batch size is 256. The beta for expential LR in learning rate schedule is 0.99. The number of training epochs is 100. ... For the downstream tasks, we used a light gradient boosting decision tree model (LGBM) (Ke et al., 2017) as it had uniformly robust prediction performance on all downstream tasks. In all experiments, we set the hyper-parameters of LGBM as follows: n_estimators = 1000, learning_rate = 0.05 max_depth = 10, reg_alpha = 0.5, reg_lambda = 0.5, scale_pos_weight = 1, min_data_in_bin = 128.