reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Principal Component Analysis using Density Power Divergence

Authors: Subhrajyoty Roy, Ayanendranath Basu, Abhik Ghosh

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical ﬁndings are supported by extensive simulations and comparisons with existing robust PCA methods. We also showcase the proposed algorithm s applicability on two benchmark data sets and a credit card transactions data set for fraud detection.
Researcher Affiliation	Academia	Subhrajyoty Roy EMAIL Ayanendranath Basu EMAIL Abhik Ghosh EMAIL Interdisciplinary Statistical Research Unit Indian Statistical Institute Kolkata 700108, West Bengal, India
Pseudocode	No	Section 2.2 Algorithm for Eﬃcient Computation of the r PCAdpd Estimator. The iteration rule for the r SVDdpd algorithm is then deﬁned by the system of equations (13).
Open Source Code	No	The paper does not explicitly provide a link to source code, nor does it state that code is available in supplementary materials or will be released.
Open Datasets	Yes	For the Credit Card Fraud Detection Data set from Le Borgne et al. (2022). The data set encompasses 28 anonymized features over 284807 transactions, with only 0.1% (492) being fraudulent. For demonstration, we randomly sample 5% of the data set, including 19 fraudulent transactions. The first two data sets, namely the Car data set and the Octane data set are popular benchmark data sets used to compare performances of different RPCA algorithms (see Hubert et al. (2005) for details).
Dataset Splits	No	The paper mentions generating synthetic data for simulations and sampling a portion of a real dataset for demonstration, but it does not specify explicit training/test/validation splits for any of the datasets used in the experiments.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies	No	The r PCAdpd algorithm is implemented in R and publicly available as part of the `rPCAdpd` R-package. No specific version numbers for R or any other software dependencies are provided.
Experiment Setup	Yes	In each of these simulation scenarios, we keep the choice of r = 5 ﬁxed, as more than 90% of the variability can be explained by the ﬁrst 5 principal components. The r PCAdpd estimator with L1-median as the location estimator in these tables as the DPD method, with the robustness parameter shown in parenthesis. For demonstration, we randomly sample 5% of the data set, including 19 fraudulent transactions. The first 5 principal components, explaining over 80% of variation, are retained for both classical and r PCAdpd algorithms.