Stochastic Canonical Correlation Analysis

Authors: Chao Gao, Dan Garber, Nathan Srebro, Jialei Wang, Weiran Wang

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the sample complexity of canonical correlation analysis (CCA), i.e., the number of samples needed to estimate the population canonical correlation and directions up to arbitrarily small error. With mild assumptions on the data distribution, we show that in order to achieve ϵ-suboptimality in a properly defined measure of alignment between the estimated canonical directions and the population solution, we can solve the empirical objective exactly with N(ϵ, , γ) samples... Moreover, we can achieve the same learning accuracy by drawing the same level of samples and solving the empirical objective approximately with a stochastic optimization algorithm... Finally, we show that, given an estimate of the canonical correlation, the streaming version of the shift-and-invert power iterations achieves the same learning accuracy with the same level of sample complexity, by processing the data only once. The paper focuses on theoretical analysis, sample complexity bounds, and algorithm design with proofs, without empirical validation on specific datasets or performance metrics from experiments.
Researcher Affiliation Academia Chao Gao EMAIL University of Chicago Chicago, IL 60637, USA; Dan Garber EMAIL Technion Israel Institute of Technology Haifa, 3200003, Israel; Nathan Srebro EMAIL Toyota Technological Institute at Chicago Chicago, IL 60637, USA; Jialei Wang EMAIL University of Chicago Chicago, IL 60637, USA; Weiran Wang EMAIL Toyota Technological Institute at Chicago Chicago, IL 60637, USA. All listed institutions are universities or academic research institutes.
Pseudocode Yes Algorithm 1 Streaming SVRG for minw f(w). Algorithm 2 Non-uniform sampling SVRG for optimizing finite-sum of nonconvex components F(w) = 1 n Pn i=1 fi(w).
Open Source Code No The paper includes a license for the document itself: 'License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v20/18-095.html.'. However, there is no explicit mention or link to source code for the methodology described in the paper.
Open Datasets No This paper is theoretical and focuses on sample complexity analysis for various distribution classes (Sub-Gaussian, Regular polynomial-tail, Bounded, Gaussian distribution named single canonical pair model). It does not use or provide access to any specific empirical datasets for experimental validation.
Dataset Splits No The paper is theoretical and does not perform experiments on specific datasets. Therefore, there are no dataset splits for training, validation, or testing.
Hardware Specification No The paper focuses on theoretical analysis, sample complexity, and algorithm design. It does not describe any experiments that would require specific hardware, so no hardware specifications are provided.
Software Dependencies No The paper describes theoretical algorithms and their convergence properties. It does not mention any specific software packages or libraries with version numbers required for implementation or reproduction of experimental results.
Experiment Setup No This paper is theoretical and does not present empirical experiments. Consequently, there are no details regarding experimental setup, such as hyperparameters, model initialization, or training schedules.