Robust Principal Component Analysis using Density Power Divergence
Authors: Subhrajyoty Roy, Ayanendranath Basu, Abhik Ghosh
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical findings are supported by extensive simulations and comparisons with existing robust PCA methods. We also showcase the proposed algorithm s applicability on two benchmark data sets and a credit card transactions data set for fraud detection. |
| Researcher Affiliation | Academia | Subhrajyoty Roy EMAIL Ayanendranath Basu EMAIL Abhik Ghosh EMAIL Interdisciplinary Statistical Research Unit Indian Statistical Institute Kolkata 700108, West Bengal, India |
| Pseudocode | No | Section 2.2 Algorithm for Efficient Computation of the r PCAdpd Estimator. The iteration rule for the r SVDdpd algorithm is then defined by the system of equations (13). |
| Open Source Code | No | The paper does not explicitly provide a link to source code, nor does it state that code is available in supplementary materials or will be released. |
| Open Datasets | Yes | For the Credit Card Fraud Detection Data set from Le Borgne et al. (2022). The data set encompasses 28 anonymized features over 284807 transactions, with only 0.1% (492) being fraudulent. For demonstration, we randomly sample 5% of the data set, including 19 fraudulent transactions. The first two data sets, namely the Car data set and the Octane data set are popular benchmark data sets used to compare performances of different RPCA algorithms (see Hubert et al. (2005) for details). |
| Dataset Splits | No | The paper mentions generating synthetic data for simulations and sampling a portion of a real dataset for demonstration, but it does not specify explicit training/test/validation splits for any of the datasets used in the experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The r PCAdpd algorithm is implemented in R and publicly available as part of the `rPCAdpd` R-package. No specific version numbers for R or any other software dependencies are provided. |
| Experiment Setup | Yes | In each of these simulation scenarios, we keep the choice of r = 5 fixed, as more than 90% of the variability can be explained by the first 5 principal components. The r PCAdpd estimator with L1-median as the location estimator in these tables as the DPD method, with the robustness parameter shown in parenthesis. For demonstration, we randomly sample 5% of the data set, including 19 fraudulent transactions. The first 5 principal components, explaining over 80% of variation, are retained for both classical and r PCAdpd algorithms. |