reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generalized Score Matching for Non-Negative Data

Authors: Shiqing Yu, Mathias Drton, Ali Shojaie

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulation results and applications to RNAseq data are given in Section 7.
Researcher Affiliation	Academia	Shiqing Yu EMAIL Department of Statistics University of Washington, Seattle, WA, U.S.A. Mathias Drton EMAIL Department of Mathematical Sciences University of Copenhagen, Copenhagen, Denmark and Department of Statistics University of Washington, Seattle, WA, U.S.A. Ali Shojaie EMAIL Department of Biostatistics University of Washington, Seattle, WA, U.S.A.
Pseudocode	No	The paper mentions 'We use a coordinate-descent method analogous to Algorithm 2 in Lin et al. (2016)' but does not present a pseudocode or algorithm block within its own text.
Open Source Code	No	In our implementation for pairwise interaction models of Section 5.1 (that will become available in an R package), we optimize our loss functions with respect to a symmetric matrix K̂; in the non-centered case the vector η̂ is also included.
Open Datasets	Yes	In this section we apply our regularized generalized h-score matching estimator for truncated non-centered GGMs to RNAseq data also studied in Lin et al. (2016), since the same model is considered therein. The data consists of n = 487 prostate adenocarcinoma samples from The Cancer Genome Atlas (TCGA) data set.
Dataset Splits	No	The paper describes using 'm = 100 variables and n = 80 and n = 1000 samples' for simulations and 'n = 487 prostate adenocarcinoma samples' for RNAseq data, but does not specify how these samples are split into training, validation, or test sets.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running experiments, such as GPU models, CPU types, or other computing specifications.
Software Dependencies	No	The paper mentions that an 'R package' will be made available for their implementation, but it does not specify any software names with version numbers for R or any other libraries used in their experiments.
Experiment Setup	Yes	In our simulation experiments, we consider m = 100 variables and n = 80 and n = 1000 samples... The ampliﬁer is set based on Theorem 16 to δ = C(n, m) = 1.8647 for truncated GGMs... We choose h(x) = min(x, 3) and use the upper-bound multiplier (high)... and choose the regularization parameter λ so that the estimated graph has exactly m = 333 edges.