Transformers are Universal In-context Learners
Authors: Takashi Furuya, Maarten V de Hoop, Gabriel Peyré
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our work provides a rigorous formalization of transformer expressivity and continuity as operating over the space of probability distributions. The main mathematical results is the universality presented in Theorems 1 and 2, respectively, for the unmasked and the masked settings. |
| Researcher Affiliation | Academia | 1Doshisha Univ. EMAIL 2Rice Univ. EMAIL 3CNRS, ENS, PSL Univ. EMAIL |
| Pseudocode | No | The paper focuses on theoretical proofs and mathematical derivations. There are no sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm', nor are there any structured code-like blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide links to any code repositories. |
| Open Datasets | No | The paper is theoretical and does not describe or use any specific datasets for experiments, therefore it does not provide information about publicly available datasets. Appendix D.2 discusses linear regression in a theoretical context using a 'data distribution µ' but this is a mathematical abstraction, not a concrete dataset. |
| Dataset Splits | No | The paper does not describe any experiments or utilize specific datasets, therefore there is no information provided regarding training, test, or validation dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not involve experimental runs requiring specific hardware. Thus, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not involve implementing software or running experiments. Therefore, no software dependencies with version numbers are listed. |
| Experiment Setup | No | The paper is theoretical and focuses on mathematical proofs and universal approximation theorems. It does not describe any practical experiments, hyperparameters, or training configurations. |