reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes

Authors: Justin D. Silverman, Kimberly Roche, Zachary C. Holmes, Lawrence A. David, Sayan Mukherjee

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through both simulations and analyses of real datasets using MLN models, we show that our inference schemes are both highly accurate and often 4-5 orders of magnitude faster than MCMC.
Researcher Affiliation	Academia	Justin D. Silverman EMAIL College of Information Science and Technology, Department of Statistics, and Institute for Computational and Data Science Penn State University University Park, PA, 16802, USA Kimberly Roche EMAIL Program in Computational Biology and Bioinformatics Duke University Durham, NC, 27708, USA Zachary C. Holmes EMAIL Department of Molecular Genetics and Microbiology Duke University Durham, NC, 27708, USA Lawrence A. David EMAIL Department of Molecular Genetics and Microbiology and Center for Genomic and Computational Biology Duke University Durham, NC, 27708, USA Sayan Mukherjee EMAIL Departments of Statistical Science, Mathematics, Computer Science, Biostatistics & Bioinformatics Duke University Durham, NC, 27708, USA
Pseudocode	Yes	Algorithm 1: The Collapse-Uncollapse (CU) Sampler for Marginally LTP Models Data: Y, υ, B, K, A Result: S samples of the form {Ψ(s), η(s)} Sample {η(1), . . . , η(S)} p(η \| Y ) where p(η \| Y ) is an LTP; for s in {1, . . . , S} do in parallel Sample Ψ(s) p(Ψ \| η(s), Y );
Open Source Code	Yes	For inference of Marginally LTP models with multinomial observations and log-ratio link functions, we developed the R package ﬁdo (Silverman, 2019). Fido implements the CU sampler with Laplace approximation described above using optimized C++ code. ... Additionally all code required to reproduce the results of the next two sections, including the alternative implementations of multinomial logistic-normal linear models discussed in Section 5 is available as a Git Hub repository at github.com/jsilve24/ﬁdo paper code.
Open Datasets	Yes	To demonstrate that LA Collapsed (from the R package ﬁdo) provides an accurate and eﬃcient means of modeling real microbiome data, we reanalyzed a previously published study comparing microbial composition in the terminal ileum of subjects with CD to healthy controls (Gevers et al., 2014). Sequence count data was obtained from the R package Microbe DS (github.com/twbattaglia/Microbe DS). Sequence count data was obtained from the R package Fido (github.com/jsilve24/ﬁdo).
Dataset Splits	No	For each evaluated triple (N, D, Q), three simulated data-sets were created based on the multinomial logistic-normal linear model with the following speciﬁed likelihood: Y j = Multinomial(nj, π j) π j = ALR 1 D (η j) η j = N(ΛX j, Σ). To allow us to compare to alternative implementations we randomly subset the data to contain 83 samples.
Hardware Specification	No	All replicates of the simulated count data were supplied to the various implementations independently and the models were ﬁt on identical hardware, allotted 64GB RAM, 4 cores, and restricted to a 48-hour upper limit on run-time.
Software Dependencies	Yes	All implementations were compiled and run using gcc version 6.2.0, R version 3.4.2, and Intel(R) Math Kernel Library version 2019 where possible.
Experiment Setup	Yes	Prior hyper-parameters were chosen to reﬂect common default choices, e.g., mean parameters set to zero, and covariance parameters set to the identity matrix. The prior degrees-of-freedom parameter ν is deﬁned on the range ν > D, this parameter was chosen as ν = D + 10. To evaluate the impact of using small values for the degree-of-freedom parameter ν in model priors, we set ν = D + 3. A full description of our prior assumptions is given in Appendix I. To evaluate the impact of using small values for the degree-of-freedom parameter ν in model priors, we set ν = D +2. Details on these kernels as well as the matrix functions Θ are described further in Appendix J.