Transformers are Universal In-context Learners

Authors: Takashi Furuya, Maarten V de Hoop, Gabriel Peyré

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our work provides a rigorous formalization of transformer expressivity and continuity as operating over the space of probability distributions. The main mathematical results is the universality presented in Theorems 1 and 2, respectively, for the unmasked and the masked settings.
Researcher Affiliation Academia 1Doshisha Univ. EMAIL 2Rice Univ. EMAIL 3CNRS, ENS, PSL Univ. EMAIL
Pseudocode No The paper focuses on theoretical proofs and mathematical derivations. There are no sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm', nor are there any structured code-like blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide links to any code repositories.
Open Datasets No The paper is theoretical and does not describe or use any specific datasets for experiments, therefore it does not provide information about publicly available datasets. Appendix D.2 discusses linear regression in a theoretical context using a 'data distribution µ' but this is a mathematical abstraction, not a concrete dataset.
Dataset Splits No The paper does not describe any experiments or utilize specific datasets, therefore there is no information provided regarding training, test, or validation dataset splits.
Hardware Specification No The paper is theoretical and does not involve experimental runs requiring specific hardware. Thus, no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not involve implementing software or running experiments. Therefore, no software dependencies with version numbers are listed.
Experiment Setup No The paper is theoretical and focuses on mathematical proofs and universal approximation theorems. It does not describe any practical experiments, hyperparameters, or training configurations.