reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improving Prediction from Dirichlet Process Mixtures via Enrichment

Authors: Sara Wade, David B. Dunson, Sonia Petrone, Lorenzo Trippa

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Advantages are shown through both predictive equations and examples, including an application to diagnosis Alzheimer s disease. We provide a simulated example in Section 5 to demonstrate how the EDP model can lead to more eﬃcient estimators by making better use of information contained in the sample. Finally, in Section 6, we apply the model to predict Alzheimer s disease status based on measurements of various brain structures.
Researcher Affiliation	Academia	Sara Wade EMAIL Department of Engineering University of Cambridge Cambridge, CB2 1PZ, UK David B. Dunson EMAIL Department of Statistical Science Duke University Durham, NC 27708-0251, USA Sonia Petrone EMAIL Department of Decision Sciences Bocconi University Milan, 20136, Italy Lorenzo Trippa EMAIL Department of Biostatistics Harvard University Boston, MA 02115, USA
Pseudocode	Yes	This appendix contains further details of the MCMC algorithm described in Section 4. The conditional distribution of si = (si,y, si,x), which denotes the vector containing the y-cluster and x-cluster membership for subject i, is ... Each iteration of the MCMC algorithm is summarized as follows: For i = 1, . . . , n, if si,y = j and n i j = 0, then remove θ j and ψ l\|j from (θ , ψ ). Otherwise, if si,y = j, si,x = l and n i l\|j = 0, then remove ψ l\|j from ψ . Next, sample si given ρ i n 1, θ , ψ , x1:n, y1:n as deﬁned by Equation (16). If si,y = k i + 1, sample θ k i+1 given yi, xi and ψ 1\|k i+1 given xi and concatenate them to (θ , ψ ). Otherwise, if si,y = j and si,x = k i j + 1, sample ψ k i j +1\|j given xi and concatenate it to ψ . Carry out the ﬁrst move described in the Metropolis-Hastings step. Sample u U(0, 1). If u < 0.5, perform move 2, otherwise perform move 3. For j = 1, . . . , k, sample θ j given (y j , x j), that is, from the posterior based on p0θ(θ j) and Q i Sj+ K(yi\|xi, θ j), and for l = 1, . . . , kj, sample ψ l\|j given x l\|j, that is, from the posterior based on p0ψ(ψ l\|j) and Q i Sj,l K(xi\|ψ l\|j).
Open Source Code	No	Not found. The paper mentions using third-party R packages ('coda', 'ROCR', 'kernlab', 'randomForest') but does not provide access to the authors' own implementation code for the methodology described.
Open Datasets	Yes	Data used in preparation of this article were obtained from the Alzheimer s Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.ucla.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.
Dataset Splits	Yes	The data were randomly split into a training sample of size 185 and a test sample of size 192.
Hardware Specification	No	Not found. The paper does not specify the hardware (e.g., CPU, GPU models, memory) used for running experiments.
Software Dependencies	No	Not found. The paper mentions several R packages (coda, ROCR, kernlab, randomForest) but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	A list of the prior parameters can be found in the Appendix. We assign hyperpriors to the mass parameters, where for the DP model, α Gamma(1, 1), and for the EDP model, αθ Gamma(1, 1), αψ(β, σ2 y) iid Gamma(1, 1) for all β, σ2 y Rp+1 R+. The computational procedures described in Section 4 were used to obtain posterior inference with 20,000 iterations and burn in period of 5,000. An examination of the trace and autocorrelation plots for the subject-speciﬁc parameters (βi, σ2 y,i, µi, σ2 x,i) provided evidence of convergence... For the prior parameters for the AD study in Section 6, C 1 is a diagonal matrix with diagonal elements (400, .0001, .0001, 0.0004, 4, 4, .25, .25, 4, 4, 4, 4, 1, 1, 1, 1), and µ0 = (1000, 1450, 45, 3.25, 3.25, 2, 2, 2.4, 2.4, 2.5, 2.5, 2.3, 2.3, 2.75, 2.75) ; cx,h = 1/2, ax,h = 2 h; and bx = (10000, 10000, 150, .25, .25, .25, .25, .04, .04, .04, .04, .04, .04, .1, .1) . For both results the number of iterations is 50,000 with burn in period of 10,000.