reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

High-dimensional Linear Discriminant Analysis Classifier for Spiked Covariance Model

Authors: Houssem Sifaou, Abla Kammoun, Mohamed-Slim Alouini

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical simulations, using both real and synthetic data, show that the proposed classiﬁer yields better classiﬁcation performance than the classical R-LDA while requiring lower computational complexity.
Researcher Affiliation	Academia	Houssem Sifaou EMAIL Abla Kammoun EMAIL Mohamed-Slim Alouini EMAIL Computer, Electrical and Mathematical Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, KSA
Pseudocode	No	The paper describes the proposed method using mathematical derivations and prose, but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures with structured steps.
Open Source Code	No	The paper does not explicitly state that source code for the described methodology is released or provide a link to a code repository. The provided links are for the paper's license and attribution requirements.
Open Datasets	Yes	For real data simulation, we use two datasets. The ﬁrst one is the USPS dataset which is one of the standard datasets for handwritten digit recognition. The dataset is publicly available at http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets.
Dataset Splits	Yes	Step 1: Let q0 be the ratio between the total number of samples in class C0 to the total number of samples available in the full dataset. Denote by n Full the total number of samples in the full dataset. Choose n < n Full the number of training samples; set n0 = q0n , where . is the ﬂoor function and n1 = n n0. Take ni training samples belonging to class Ci randomly from the full dataset. The remaining samples are used as a test dataset in order to estimate the misclassiﬁcation rate.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies or their version numbers (e.g., programming languages, libraries, frameworks) used for the implementation or experiments.
Experiment Setup	Yes	In the synthetic data simulations, we use the following Monte Carlo protocol to estimate the true misclassiﬁcation rate: Step 1: Set σ2 = 1 and choose r = 3 orthogonal symmetry breaking directions as follows : v1 = [1, 0, , 0]T , v2 = [0, 1, 0, , 0]T , v3 = [0, 0, 1, 0 , 0]T and their corresponding weights λ1 = 8, λ2 = 7, λ3 = 6. Set µ0 = 1 p[a, a, , a]T and µ1 = µ0 where a is a ﬁnite constant. We choose a = 2 and a = 2.5. ... Step 3: Using the training set, design the improved LDA classiﬁer as explained in section 3 and determine the optimal parameter γ of R-LDA using grid search over γ {10i/10, i = 10 : 1 : 10}. Step 5: Repeat Step 2 4, 500 times and determine the average classiﬁcation true error of both classiﬁers.