Efficient Sparse PCA via Block-Diagonalization
Authors: Alberto Del Pia, Dekun Zhou, Yinglun Zhu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform large-scale evaluations on many real-world datasets: for exact Sparse PCA algorithm, our method achieves an average speedup factor of 100.50, while maintaining an average approximation error of 0.61%; for approximate Sparse PCA algorithm, our method achieves an average speedup factor of 6.00 and an average approximation error of -0.91%, meaning that our method oftentimes finds better solutions. |
| Researcher Affiliation | Academia | Alberto Del Pia & Dekun Zhou University of Wisconsin-Madison EMAIL Yinglun Zhu University of California, Riverside EMAIL |
| Pseudocode | Yes | Algorithm 1 Denoising procedure via thresholding Algorithm 2 Matrix block-diagonlization Algorithm 3 Efficient Sparse PCA via block-diagonalization Algorithm 4 Estimation of E in Model 1 Algorithm 5 A practical version of Algorithm 3 without a given threshold |
| Open Source Code | No | The paper does not contain any explicit statements about releasing code, nor does it provide any links to a code repository in the main text or appendices mentioned. |
| Open Datasets | Yes | Datasets. We perform numerical tests in datasets widely studied in feature selection, including datasets Cov Colon from Alon et al. (1999), Lymphoma Cov1 from Alizadeh et al. (1998), Reddit1500 and Reddit2000 from Dey et al. (2022b), Leukemia Cov, Lymphoma Cov2, and Prostate Cov from Dettling (2004), Arcene Cov and Dorothea Cov from Guyon et al. (2007), and GLI85Cov and GLABRA180Cov from Li (2020). We take the sample covariance matrices of features in these datasets as our input matrices A. The dimensions of A range from 500 to 100000. Most of the matrices A are dense and hence are not block-diagonal matrices, as discussed in Appendix C.1. |
| Dataset Splits | No | The paper mentions using "sample covariance matrices" as input but does not specify any training, validation, or test dataset splits. The experiments appear to be performed on the entire matrices. |
| Hardware Specification | Yes | Compute resources. We conducted all tests on a computing cluster equipped with 36 Cores (2x 3.1G Xeon Gold 6254 CPUs) and 768 GB of memory. |
| Software Dependencies | No | The paper mentions "Branch-and-Bound algorithm" and "Chan’s algorithm" but does not provide specific software names with version numbers for any libraries or solvers used in their implementation. |
| Experiment Setup | Yes | Parameters. (i) When integrated Algorithm 5 with Branch-and-Bound algorithm: We choose the parameter ε := 1, a := 0, b := A , d0 := 30, and δ := 0.01 A . (ii) When integrated Algorithm 5 with Chan s algorithm: We choose the parameter ε := 1, a := 0, b := A , d0 := 2k (twice the sparsity constant), and δ := 0.01 A . We set the time limit to 3600 seconds for each method. To ensure reproducibility, we set the random seed to 42 and run the experiments ten times for each k. |