A Direct Approach for Sparse Quadratic Discriminant Analysis

Authors: Binyan Jiang, Xiangyu Wang, Chenlei Leng

JMLR 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The promising performance of DA-QDA is illustrated via extensive simulation studies and the analysis of four real datasets. Keywords: Bayes Risk, Consistency, High Dimensional Data, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Sparsity
Researcher Affiliation Collaboration Binyan Jiang EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong, China Xiangyu Wang EMAIL Google LLC 1600 Amphitheatre Pkwy Mountain view, CA 94043, USA Chenlei Leng EMAIL Department of Statistics University of Warwick and Alan Turing Institute Coventry, CV4 7AL, UK
Pseudocode Yes Our algorithm can be now summarized as following. 1. Initialize Ω, Ψ and Λ. Fix ρ. Compute SVD ˆΣ1 = U1D1UT 1 and ˆΣ2 = U2D2UT 2 , and compute B where Bjk = 1/(d1jd2k + ρ). Repeat steps 2-4 until convergence; 2. Compute A = (ˆΣ1 ˆΣ2) Λ + ρΨ . Then update Ωas Ω= U1[B (UT 1 AU2)]UT 2 ; 3. Update Ψ by soft-thresholding Ω+ Λρ elementwise by λρ ; 4. Update Λ by Λ Λ + ρ(Ω Ψ).
Open Source Code No The paper does not provide concrete access to source code for the DA-QDA methodology. It mentions using third-party tools like Matlab, R package dsda, and glmnet package, but no explicit release of the authors' implementation.
Open Datasets Yes Quora answer classifier. This is a data challenge available at http://www.quora. com/challenges#answer_classifier. ... Gastrointestinal Lesions This dataset (P. Mesejo et al., 2016) contains the features extracted from a database of colonoscopic videos... Pancreatic cancer RNA-seq data The dataset (Weinstein et al., 2013) is part of the RNA-Seq (Hi Seq) PANCAN data set... Prostate cancer Taken from ftp://stat.ethz.ch/Manuscripts/dettling/prostate. rda, this data contains genetic expression levels...
Dataset Splits Yes For the Quora answer data, we randomly split the data into ten parts, fit a model to the nine parts of the data, and report the misclassification error on the part that is left out. (10-fold cross-validation) ... for Gastrointestinal Lesions, we perform a 10-fold corss-validation. ... For Pancreatic cancer RNA-seq data, we randomly split the dataset in two equal subsets, train on one subset and test on the other. We repeat this procedure 50 times... For the prostate cancer data... using 10-fold cross-validation.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments. It only mentions software tools and packages, but no GPU or CPU models, or other hardware specifications.
Software Dependencies No To fit DA-QDA, we employ ADMM to estimate Ωand the coordinate-wise descent algorithm (Friedman et al., 2010) to fit δ. ... We implemented s QDA in Matlab... we employ Matlab s built-in function fitcdiscr... and the R package dsda (Mai et al., 2012) to fit DSDA. ... we use the glmnet package and set α = 0.5...
Experiment Setup Yes The rate parameter ρ in ADMM is set according to the optimal criterion suggested by Ghadimi et al. (2015). The other two tuning parameters, λ for estimating Ωand λδ for estimating δ, are chosen by 5-fold cross-validation, where the loss function is chosen to be the out-ofsample misclassification rate.