reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generalization bounds for Kernel Canonical Correlation Analysis

Authors: Enayat Ullah, Raman Arora

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study the problem of multiview representation learning using kernel canonical correlation analysis (KCCA) and establish non-asymptotic bounds on generalization error for regularized empirical risk minimization. In particular, we give ﬁne-grained high-probability bounds on generalization error ranging from O(n 1/6) to O(n 1/5) depending on underlying distributional properties, where n is the number of data samples. For the special case of ﬁnite-dimensional Hilbert spaces (such as linear CCA), our rates improve, ranging from O(n 1/2) to O(n 1). Finally, our results generalize to the problem of functional canonical correlation analysis over abstract Hilbert spaces.
Researcher Affiliation	Academia	Enayat Ullah EMAIL Department of Computer Science Johns Hopkins University Raman Arora EMAIL Department of Computer Science Johns Hopkins University
Pseudocode	No	The paper describes mathematical formulations, theorems, lemmas, and proof sketches, but it does not contain any clearly labeled pseudocode or algorithm blocks. The methodologies are explained in prose and mathematical notation.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide links to any code repositories.
Open Datasets	No	The paper is theoretical in nature, focusing on generalization bounds for Kernel CCA. It refers to 'data samples' and 'random variables' in an abstract sense but does not describe experiments using any specific, publicly available datasets.
Dataset Splits	No	The paper is a theoretical work establishing generalization bounds and does not describe experiments or empirical evaluations that would require specifying dataset splits (training, validation, test).
Hardware Specification	No	The paper is theoretical and focuses on mathematical proofs and bounds. It does not describe any experiments that would necessitate the use or specification of hardware components like GPUs, CPUs, or specific computing environments.
Software Dependencies	No	The paper is a theoretical study presenting mathematical bounds and analyses. It does not describe any computational experiments or implementations that would require listing specific software dependencies with version numbers.
Experiment Setup	No	The paper is a theoretical work that establishes generalization bounds and provides proof sketches. It does not describe any practical experiments, and therefore, no experimental setup details, hyperparameters, or training configurations are provided.