Improving Prediction from Dirichlet Process Mixtures via Enrichment

Authors: Sara Wade, David B. Dunson, Sonia Petrone, Lorenzo Trippa

JMLR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Advantages are shown through both predictive equations and examples, including an application to diagnosis Alzheimer s disease. We provide a simulated example in Section 5 to demonstrate how the EDP model can lead to more efficient estimators by making better use of information contained in the sample. Finally, in Section 6, we apply the model to predict Alzheimer s disease status based on measurements of various brain structures.
Researcher Affiliation Academia Sara Wade EMAIL Department of Engineering University of Cambridge Cambridge, CB2 1PZ, UK David B. Dunson EMAIL Department of Statistical Science Duke University Durham, NC 27708-0251, USA Sonia Petrone EMAIL Department of Decision Sciences Bocconi University Milan, 20136, Italy Lorenzo Trippa EMAIL Department of Biostatistics Harvard University Boston, MA 02115, USA
Pseudocode Yes This appendix contains further details of the MCMC algorithm described in Section 4. The conditional distribution of si = (si,y, si,x), which denotes the vector containing the y-cluster and x-cluster membership for subject i, is ... Each iteration of the MCMC algorithm is summarized as follows: For i = 1, . . . , n, if si,y = j and n i j = 0, then remove θ j and ψ l|j from (θ , ψ ). Otherwise, if si,y = j, si,x = l and n i l|j = 0, then remove ψ l|j from ψ . Next, sample si given ρ i n 1, θ , ψ , x1:n, y1:n as defined by Equation (16). If si,y = k i + 1, sample θ k i+1 given yi, xi and ψ 1|k i+1 given xi and concatenate them to (θ , ψ ). Otherwise, if si,y = j and si,x = k i j + 1, sample ψ k i j +1|j given xi and concatenate it to ψ . Carry out the first move described in the Metropolis-Hastings step. Sample u U(0, 1). If u < 0.5, perform move 2, otherwise perform move 3. For j = 1, . . . , k, sample θ j given (y j , x j), that is, from the posterior based on p0θ(θ j) and Q i Sj+ K(yi|xi, θ j), and for l = 1, . . . , kj, sample ψ l|j given x l|j, that is, from the posterior based on p0ψ(ψ l|j) and Q i Sj,l K(xi|ψ l|j).
Open Source Code No Not found. The paper mentions using third-party R packages ('coda', 'ROCR', 'kernlab', 'randomForest') but does not provide access to the authors' own implementation code for the methodology described.
Open Datasets Yes Data used in preparation of this article were obtained from the Alzheimer s Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.ucla.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.
Dataset Splits Yes The data were randomly split into a training sample of size 185 and a test sample of size 192.
Hardware Specification No Not found. The paper does not specify the hardware (e.g., CPU, GPU models, memory) used for running experiments.
Software Dependencies No Not found. The paper mentions several R packages (coda, ROCR, kernlab, randomForest) but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes A list of the prior parameters can be found in the Appendix. We assign hyperpriors to the mass parameters, where for the DP model, α Gamma(1, 1), and for the EDP model, αθ Gamma(1, 1), αψ(β, σ2 y) iid Gamma(1, 1) for all β, σ2 y Rp+1 R+. The computational procedures described in Section 4 were used to obtain posterior inference with 20,000 iterations and burn in period of 5,000. An examination of the trace and autocorrelation plots for the subject-specific parameters (βi, σ2 y,i, µi, σ2 x,i) provided evidence of convergence... For the prior parameters for the AD study in Section 6, C 1 is a diagonal matrix with diagonal elements (400, .0001, .0001, 0.0004, 4, 4, .25, .25, 4, 4, 4, 4, 1, 1, 1, 1), and µ0 = (1000, 1450, 45, 3.25, 3.25, 2, 2, 2.4, 2.4, 2.5, 2.5, 2.3, 2.3, 2.75, 2.75) ; cx,h = 1/2, ax,h = 2 h; and bx = (10000, 10000, 150, .25, .25, .25, .25, .04, .04, .04, .04, .04, .04, .1, .1) . For both results the number of iterations is 50,000 with burn in period of 10,000.