reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Sparse PCA via Block-Diagonalization

Authors: Alberto Del Pia, Dekun Zhou, Yinglun Zhu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform large-scale evaluations on many real-world datasets: for exact Sparse PCA algorithm, our method achieves an average speedup factor of 100.50, while maintaining an average approximation error of 0.61%; for approximate Sparse PCA algorithm, our method achieves an average speedup factor of 6.00 and an average approximation error of -0.91%, meaning that our method oftentimes finds better solutions.
Researcher Affiliation	Academia	Alberto Del Pia & Dekun Zhou University of Wisconsin-Madison EMAIL Yinglun Zhu University of California, Riverside EMAIL
Pseudocode	Yes	Algorithm 1 Denoising procedure via thresholding Algorithm 2 Matrix block-diagonlization Algorithm 3 Efficient Sparse PCA via block-diagonalization Algorithm 4 Estimation of E in Model 1 Algorithm 5 A practical version of Algorithm 3 without a given threshold
Open Source Code	No	The paper does not contain any explicit statements about releasing code, nor does it provide any links to a code repository in the main text or appendices mentioned.
Open Datasets	Yes	Datasets. We perform numerical tests in datasets widely studied in feature selection, including datasets Cov Colon from Alon et al. (1999), Lymphoma Cov1 from Alizadeh et al. (1998), Reddit1500 and Reddit2000 from Dey et al. (2022b), Leukemia Cov, Lymphoma Cov2, and Prostate Cov from Dettling (2004), Arcene Cov and Dorothea Cov from Guyon et al. (2007), and GLI85Cov and GLABRA180Cov from Li (2020). We take the sample covariance matrices of features in these datasets as our input matrices A. The dimensions of A range from 500 to 100000. Most of the matrices A are dense and hence are not block-diagonal matrices, as discussed in Appendix C.1.
Dataset Splits	No	The paper mentions using "sample covariance matrices" as input but does not specify any training, validation, or test dataset splits. The experiments appear to be performed on the entire matrices.
Hardware Specification	Yes	Compute resources. We conducted all tests on a computing cluster equipped with 36 Cores (2x 3.1G Xeon Gold 6254 CPUs) and 768 GB of memory.
Software Dependencies	No	The paper mentions "Branch-and-Bound algorithm" and "Chan’s algorithm" but does not provide specific software names with version numbers for any libraries or solvers used in their implementation.
Experiment Setup	Yes	Parameters. (i) When integrated Algorithm 5 with Branch-and-Bound algorithm: We choose the parameter ε := 1, a := 0, b := A , d0 := 30, and δ := 0.01 A . (ii) When integrated Algorithm 5 with Chan s algorithm: We choose the parameter ε := 1, a := 0, b := A , d0 := 2k (twice the sparsity constant), and δ := 0.01 A . We set the time limit to 3600 seconds for each method. To ensure reproducibility, we set the random seed to 42 and run the experiments ten times for each k.