Sparse PCA via Covariance Thresholding
Authors: Yash Deshpande, Andrea Montanari
JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1 presents simulations on synthetic data under the strictly sparse model, and the Covariance Thresholding algorithm of Table 1, used in the proof of Theorem 3. The objective is to check whether the log p factor has any practical relevance or is a purely conceptual improvement. Figure 2 shows the performance of vanilla PCA, Diagonal Thresholding and Covariance Thresholding on the Three Peak example of Johnstone and Lu (2004). |
| Researcher Affiliation | Academia | Yash Deshpande EMAIL Department of Electrical Engineering Stanford University Stanford, CA 94305, USA Andrea Montanari EMAIL Departments of Electrical Engineering and Statistics Stanford University Stanford, CA 94305, USA |
| Pseudocode | Yes | Algorithm 1 Covariance Thresholding |
| Open Source Code | No | No explicit statement or link to source code is provided in the paper. |
| Open Datasets | No | Figure 1 presents simulations on synthetic data under the strictly sparse model... Figure 2 shows the performance of vanilla PCA, Diagonal Thresholding and Covariance Thresholding on the Three Peak example... A similar experiment with the box example of Johnstone and Lu is provided in Figure 3. The paper describes using synthetic data and established problem examples, but does not provide access information (links, DOIs, specific repositories) for any particular dataset used in the empirical evaluations. |
| Dataset Splits | No | For notational convenience, we shall assume that 2n sample vectors are given (instead of n): {xi}1 i 2n. We start by splitting the data into two halves: (xi)1 i n and (xi)n<i 2n and compute the respective sample covariance matrices G and G respectively. This describes how the data is used within the algorithm, not how a dataset is split for evaluation of a model's performance on unseen data (e.g., train/test/validation splits). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the simulations or experiments. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers used for implementation or experiments. |
| Experiment Setup | Yes | Choosing τ: Although in the statement of the theorem, our choice of τ depends on the SNR β/σ2, it is reasonable to instead threshold at the noise level , as follows. The noise component of entry i, j of the sample covariance (ignoring lower order terms) is given by σ2 zi, zj /n. By the central limit theorem, zi, zj / n d N(0, 1). Consequently, σ2 zi, zj /n N(0, σ4/n), and we need to choose the (rescaled) threshold proportional to σ4 = σ2. Using previous estimates, we let τ = ν bσ2 for a constant ν . In simulations, a choice 3 ν 4 appears to work well. Parameters for Covariance Thresholding are chosen as in Section 4, with ν = 4.5. |