reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards a learning theory of representation alignment

Authors: Francesco Maria Gabriele Insulla, Shuo Huang, Lorenzo Rosasco

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we propose a learning-theoretic perspective to representation alignment. First, we review and connect different notions of alignment based on metric, probabilistic, and spectral ideas. Then, we focus on stitching, a particular approach to understanding the interplay between different representations in the context of a task. Our main contribution here is to relate the properties of stitching to the kernel alignment of the underlying representation. Our results can be seen as a first step toward casting representation alignment as a learning-theoretic problem. (b) We provide a generalization error bound of linear stitching with the kernel alignment of the underlying representation.
Researcher Affiliation	Academia	Francesco Insulla Institute of Computational and Mathematical Engineering Stanford University Stanford, CA 94305, USA EMAIL Shuo Huang Istituto Italiano di Tecnologia Genoa, GE 16163, Italy EMAIL Lorenzo Rosasco Ma LGa Center, DIBRIS, Universit a di Genova, Genoa, GE 16146, Italy CBMM, Massachusetts Institute of Technology, Cambridge, MA 02139, USA Istituto Italiano di Tecnologia, Genoa, GE 16163, Italy EMAIL
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. It primarily consists of mathematical definitions, theorems, and proofs.
Open Source Code	No	The paper does not contain any explicit statements about the release of source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	No	The paper is theoretical in nature and does not describe or utilize specific datasets with access information for its own contributions. It mentions "diverse datasets" in the context of large AI models, but not for its own experimental validation.
Dataset Splits	No	The paper is theoretical and does not describe experiments that would require dataset splits.
Hardware Specification	No	The paper is theoretical and does not describe experimental implementations or the hardware used to perform them.
Software Dependencies	No	The paper is theoretical and does not provide details about specific software dependencies or their version numbers.
Experiment Setup	No	The paper is theoretical and focuses on mathematical concepts and proofs, thus it does not include details on experimental setup, hyperparameters, or training configurations.