reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stochastic Canonical Correlation Analysis

Authors: Chao Gao, Dan Garber, Nathan Srebro, Jialei Wang, Weiran Wang

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study the sample complexity of canonical correlation analysis (CCA), i.e., the number of samples needed to estimate the population canonical correlation and directions up to arbitrarily small error. With mild assumptions on the data distribution, we show that in order to achieve ϵ-suboptimality in a properly deﬁned measure of alignment between the estimated canonical directions and the population solution, we can solve the empirical objective exactly with N(ϵ, , γ) samples... Moreover, we can achieve the same learning accuracy by drawing the same level of samples and solving the empirical objective approximately with a stochastic optimization algorithm... Finally, we show that, given an estimate of the canonical correlation, the streaming version of the shift-and-invert power iterations achieves the same learning accuracy with the same level of sample complexity, by processing the data only once. The paper focuses on theoretical analysis, sample complexity bounds, and algorithm design with proofs, without empirical validation on specific datasets or performance metrics from experiments.
Researcher Affiliation	Academia	Chao Gao EMAIL University of Chicago Chicago, IL 60637, USA; Dan Garber EMAIL Technion Israel Institute of Technology Haifa, 3200003, Israel; Nathan Srebro EMAIL Toyota Technological Institute at Chicago Chicago, IL 60637, USA; Jialei Wang EMAIL University of Chicago Chicago, IL 60637, USA; Weiran Wang EMAIL Toyota Technological Institute at Chicago Chicago, IL 60637, USA. All listed institutions are universities or academic research institutes.
Pseudocode	Yes	Algorithm 1 Streaming SVRG for minw f(w). Algorithm 2 Non-uniform sampling SVRG for optimizing ﬁnite-sum of nonconvex components F(w) = 1 n Pn i=1 fi(w).
Open Source Code	No	The paper includes a license for the document itself: 'License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v20/18-095.html.'. However, there is no explicit mention or link to source code for the methodology described in the paper.
Open Datasets	No	This paper is theoretical and focuses on sample complexity analysis for various distribution classes (Sub-Gaussian, Regular polynomial-tail, Bounded, Gaussian distribution named single canonical pair model). It does not use or provide access to any specific empirical datasets for experimental validation.
Dataset Splits	No	The paper is theoretical and does not perform experiments on specific datasets. Therefore, there are no dataset splits for training, validation, or testing.
Hardware Specification	No	The paper focuses on theoretical analysis, sample complexity, and algorithm design. It does not describe any experiments that would require specific hardware, so no hardware specifications are provided.
Software Dependencies	No	The paper describes theoretical algorithms and their convergence properties. It does not mention any specific software packages or libraries with version numbers required for implementation or reproduction of experimental results.
Experiment Setup	No	This paper is theoretical and does not present empirical experiments. Consequently, there are no details regarding experimental setup, such as hyperparameters, model initialization, or training schedules.