Integrative Analysis using Coupled Latent Variable Models for Individualizing Prognoses
Authors: Peter Schulam, Suchi Saria
JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our approach to the problem of predicting lung disease trajectories in scleroderma, a complex autoimmune disease. We show that our model improves over state-of-the-art baselines in predictive accuracy and we provide a qualitative analysis of our model s output. Finally, the variability of disease presentation in scleroderma makes clinical trial recruitment challenging. We show that a prognostic tool that integrates multiple types of routinely collected longitudinal data can be used to identify individuals at greatest risk of rapid progression and to target trial recruitment. |
| Researcher Affiliation | Academia | Department of Computer Science Johns Hopkins University Baltimore, MD 21218, USA |
| Pseudocode | Yes | Figure 2: Two-stage procedure for fitting the Coupled Latent Trajectory Model (C-LTM). |
| Open Source Code | No | The paper does not contain an explicit statement about the availability of source code or a link to a code repository. |
| Open Datasets | No | We train and validate our model using data from the Johns Hopkins Scleroderma Center patient registry, one of largest collections of clinical scleroderma data in the world. |
| Dataset Splits | Yes | We divide our data into 10 folds and use log-likelihood on the first fold for tuning hyperparameters. For PFVC, we select G = 9 subtypes using BIC. For the kernel hyperparameters Θ1 = {Σb, α, ℓ, σ2} we set Σb R to be 16.0, which corresponds to the variance of individual-specific intercepts. We set α = 6, ℓ= 2, and σ2 = 1 using a grid search over values chosen using domain knowledge. Qualitatively, these make sense; we expect transient deviations to last around 2 years and to change PFVC by around 6 units. Finally, we penalize the expected log-likelihood with respect to β1:G as in Eq. 4 and set the weight ρ = 0.01, which was chosen based on the clinical interpretability of the learned subtype trajectories. The remaining 9 folds were used for our cross-validation experiments. |
| Hardware Specification | No | On a standard laptop, we are able to train the model on 772 patients (5,458 PFVC measurements) in 10-20 minutes. |
| Software Dependencies | No | We optimize the objective using the Orthant-Wise Limited-memory Quasi-Newton (OWL-QN) algorithm (Andrew and Gao, 2007). |
| Experiment Setup | Yes | For the population model, we use constant functions (i.e. the basis expansion Φp(t) contains an intercept term whose coefficient is determined by baseline covariates). For the subpopulation B-splines, we set boundary knots at 0 and 25 years (the maximum observation time in our data set is 23 years), use two interior knots that divide the time period from 0-25 years into three equally spaced chunks, and use quadratics as the piecewise components. For the individual-specific long-term basis Φℓ, we use the same basis as the population model (constant functions). We divide our data into 10 folds and use log-likelihood on the first fold for tuning hyperparameters. For PFVC, we select G = 9 subtypes using BIC. For the kernel hyperparameters Θ1 = {Σb, α, ℓ, σ2} we set Σb R to be 16.0, which corresponds to the variance of individual-specific intercepts. We set α = 6, ℓ= 2, and σ2 = 1 using a grid search over values chosen using domain knowledge. Qualitatively, these make sense; we expect transient deviations to last around 2 years and to change PFVC by around 6 units. Finally, we penalize the expected log-likelihood with respect to β1:G as in Eq. 4 and set the weight ρ = 0.01, which was chosen based on the clinical interpretability of the learned subtype trajectories. |