reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Direct Approach for Sparse Quadratic Discriminant Analysis

Authors: Binyan Jiang, Xiangyu Wang, Chenlei Leng

JMLR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The promising performance of DA-QDA is illustrated via extensive simulation studies and the analysis of four real datasets. Keywords: Bayes Risk, Consistency, High Dimensional Data, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Sparsity
Researcher Affiliation	Collaboration	Binyan Jiang EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong, China Xiangyu Wang EMAIL Google LLC 1600 Amphitheatre Pkwy Mountain view, CA 94043, USA Chenlei Leng EMAIL Department of Statistics University of Warwick and Alan Turing Institute Coventry, CV4 7AL, UK
Pseudocode	Yes	Our algorithm can be now summarized as following. 1. Initialize Ω, Ψ and Λ. Fix ρ. Compute SVD ˆΣ1 = U1D1UT 1 and ˆΣ2 = U2D2UT 2 , and compute B where Bjk = 1/(d1jd2k + ρ). Repeat steps 2-4 until convergence; 2. Compute A = (ˆΣ1 ˆΣ2) Λ + ρΨ . Then update Ωas Ω= U1[B (UT 1 AU2)]UT 2 ; 3. Update Ψ by soft-thresholding Ω+ Λρ elementwise by λρ ; 4. Update Λ by Λ Λ + ρ(Ω Ψ).
Open Source Code	No	The paper does not provide concrete access to source code for the DA-QDA methodology. It mentions using third-party tools like Matlab, R package dsda, and glmnet package, but no explicit release of the authors' implementation.
Open Datasets	Yes	Quora answer classiﬁer. This is a data challenge available at http://www.quora. com/challenges#answer_classifier. ... Gastrointestinal Lesions This dataset (P. Mesejo et al., 2016) contains the features extracted from a database of colonoscopic videos... Pancreatic cancer RNA-seq data The dataset (Weinstein et al., 2013) is part of the RNA-Seq (Hi Seq) PANCAN data set... Prostate cancer Taken from ftp://stat.ethz.ch/Manuscripts/dettling/prostate. rda, this data contains genetic expression levels...
Dataset Splits	Yes	For the Quora answer data, we randomly split the data into ten parts, ﬁt a model to the nine parts of the data, and report the misclassiﬁcation error on the part that is left out. (10-fold cross-validation) ... for Gastrointestinal Lesions, we perform a 10-fold corss-validation. ... For Pancreatic cancer RNA-seq data, we randomly split the dataset in two equal subsets, train on one subset and test on the other. We repeat this procedure 50 times... For the prostate cancer data... using 10-fold cross-validation.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments. It only mentions software tools and packages, but no GPU or CPU models, or other hardware specifications.
Software Dependencies	No	To ﬁt DA-QDA, we employ ADMM to estimate Ωand the coordinate-wise descent algorithm (Friedman et al., 2010) to ﬁt δ. ... We implemented s QDA in Matlab... we employ Matlab s built-in function ﬁtcdiscr... and the R package dsda (Mai et al., 2012) to ﬁt DSDA. ... we use the glmnet package and set α = 0.5...
Experiment Setup	Yes	The rate parameter ρ in ADMM is set according to the optimal criterion suggested by Ghadimi et al. (2015). The other two tuning parameters, λ for estimating Ωand λδ for estimating δ, are chosen by 5-fold cross-validation, where the loss function is chosen to be the out-ofsample misclassiﬁcation rate.