High-dimensional Linear Discriminant Analysis Classifier for Spiked Covariance Model
Authors: Houssem Sifaou, Abla Kammoun, Mohamed-Slim Alouini
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical simulations, using both real and synthetic data, show that the proposed classifier yields better classification performance than the classical R-LDA while requiring lower computational complexity. |
| Researcher Affiliation | Academia | Houssem Sifaou EMAIL Abla Kammoun EMAIL Mohamed-Slim Alouini EMAIL Computer, Electrical and Mathematical Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, KSA |
| Pseudocode | No | The paper describes the proposed method using mathematical derivations and prose, but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures with structured steps. |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is released or provide a link to a code repository. The provided links are for the paper's license and attribution requirements. |
| Open Datasets | Yes | For real data simulation, we use two datasets. The first one is the USPS dataset which is one of the standard datasets for handwritten digit recognition. The dataset is publicly available at http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets. |
| Dataset Splits | Yes | Step 1: Let q0 be the ratio between the total number of samples in class C0 to the total number of samples available in the full dataset. Denote by n Full the total number of samples in the full dataset. Choose n < n Full the number of training samples; set n0 = q0n , where . is the floor function and n1 = n n0. Take ni training samples belonging to class Ci randomly from the full dataset. The remaining samples are used as a test dataset in order to estimate the misclassification rate. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies or their version numbers (e.g., programming languages, libraries, frameworks) used for the implementation or experiments. |
| Experiment Setup | Yes | In the synthetic data simulations, we use the following Monte Carlo protocol to estimate the true misclassification rate: Step 1: Set σ2 = 1 and choose r = 3 orthogonal symmetry breaking directions as follows : v1 = [1, 0, , 0]T , v2 = [0, 1, 0, , 0]T , v3 = [0, 0, 1, 0 , 0]T and their corresponding weights λ1 = 8, λ2 = 7, λ3 = 6. Set µ0 = 1 p[a, a, , a]T and µ1 = µ0 where a is a finite constant. We choose a = 2 and a = 2.5. ... Step 3: Using the training set, design the improved LDA classifier as explained in section 3 and determine the optimal parameter γ of R-LDA using grid search over γ {10i/10, i = 10 : 1 : 10}. Step 5: Repeat Step 2 4, 500 times and determine the average classification true error of both classifiers. |