reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stochastic Approximation for Canonical Correlation Analysis

Authors: Raman Arora, Teodor Vanislavov Marinov, Poorya Mianjy, Nati Srebro

NeurIPS 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide experimental results for our proposed methods, in particular we compare capped-MSG which is the practical variant of Algorithm 1 with capping as deﬁned in equation (10), and MEG (Algorithm 2 in the Appendix), on a real dataset, Mediamill [19], consisting of paired observations of videos and corresponding commentary. We compare our algorithms against CCALin of [8], ALS CCA of [24]2, and SAA, which is denoted by batch in Figure 1.
Researcher Affiliation	Academia	Raman Arora Dept. of Computer Science Johns Hopkins University Baltimore, MD 21204 EMAIL Teodor V. Marinov Dept. of Computer Science Johns Hopkins University Baltimore, MD 21204 EMAIL Poorya Mianjy Dept. of Computer Science Johns Hopkins University Baltimore, MD 21204 EMAIL Nathan Srebro TTI-Chicago Chicago, Illinois 60637 EMAIL
Pseudocode	Yes	Algorithm 1 Matrix Stochastic Gradient for CCA (MSG-CCA) Input: Training data {(xt, yt)}T t=1, step size , auxiliary training data {(x0 i=1 Output: M
Open Source Code	Yes	We make our implementation of the proposed algorithms and existing competing techniques available online1. 1https://www.dropbox.com/sh/dkz4zgkevfyzif3/AABK9JlUvIUYtHvLPCBXLlpha?dl=0
Open Datasets	Yes	We provide experimental results for our proposed methods, in particular we compare capped-MSG which is the practical variant of Algorithm 1 with capping as deﬁned in equation (10), and MEG (Algorithm 2 in the Appendix), on a real dataset, Mediamill [19], consisting of paired observations of videos and corresponding commentary.
Dataset Splits	No	The paper mentions 'Training data' for Algorithm 1 and 'training dataset' in the Problem Formulation section but does not specify any particular train/validation/test splits (e.g., percentages, sample counts, or citations to predefined splits) needed for reproduction. It only states the total number of samples n = 10,000 for Mediamill.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. It only mentions 'CPU runtime' as a metric.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	For both MSG and MEG we set the step size at iteration t to be t = 0.1 p. The target dimensionality in our experiments is k 2 {1, 2, 4}. To ensure that the problem is well-conditioned, we add λI for λ = 0.1 to the empirical estimates of the covariance matrices on Mediamill dataset.