Statistical Query Lower Bounds for Tensor PCA

Authors: Rishabh Dudeja, Daniel Hsu

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We provide a sharp analysis of the optimal sample complexity in the Statistical Query (SQ) model and show that SQ algorithms with polynomial query complexity not only fail to solve Tensor PCA in the conjectured hard phase, but also have a strictly sub-optimal sample complexity compared to some polynomial time estimators such as the Richard-Montanari spectral estimator. Our analysis reveals that the optimal sample complexity in the SQ model depends on whether ET1 is symmetric or not. For symmetric, even order tensors, we also isolate a sample size regime in which it is possible to test if ET1 = 0 or ET1 = 0 with polynomially many queries but not estimate ET1. Our proofs rely on the Fourier analytic approach of Feldman et al. (2018) to prove sharp SQ lower bounds.
Researcher Affiliation Academia Rishabh Dudeja EMAIL Department of Statistics Columbia University New York, NY 10027, USA Daniel Hsu EMAIL Department of Computer Science Columbia University New York, NY 10027, USA
Pseudocode No The paper describes "optimal SQ procedures" in Section 3 and Appendix E, but it does not present them in a structured pseudocode block or explicitly label any section as "Pseudocode" or "Algorithm". The description is in paragraph form.
Open Source Code No The paper does not contain any statement about releasing source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets No In the Tensor PCA problem introduced by Richard and Montanari (2014), one is given a dataset consisting of n samples T1:n of i.i.d. Gaussian tensors of order k with the promise that ET1 is a rank-1 tensor and ET1 = 1. This describes the theoretical problem setup, not an actual dataset used for experiments. The paper is theoretical and does not conduct experiments on datasets.
Dataset Splits No The paper is theoretical and describes a problem setup involving hypothetical 'n samples' rather than actual datasets. Therefore, it does not provide details on training/test/validation dataset splits.
Hardware Specification No The paper is theoretical and focuses on mathematical analysis and lower bounds. It does not describe any experimental procedures that would require specific hardware specifications.
Software Dependencies No The paper is theoretical and focuses on mathematical proofs and analysis. It does not describe any experimental implementations that would require specific software dependencies or version numbers.
Experiment Setup No The paper is theoretical and focuses on deriving lower bounds and optimal SQ procedures. It does not include any empirical experiments or mention specific hyperparameters, training schedules, or other experimental setup details.