Attention is Turing-Complete

Authors: Jorge Pérez, Pablo Barceló, Javier Marinkovic

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We show that the Transformer with hard-attention is Turing complete exclusively based on their capacity to compute and access internal dense representations of the data. Our study also reveals some minimal sets of elements needed to obtain this completeness result. The proof of Turing completeness of the Transformer is based on a direct simulation of a Turing machine which we believe to be quite intuitive.
Researcher Affiliation Academia Jorge P erez EMAIL Department of Computer Science Universidad de Chile IMFD Chile Pablo Barcel o EMAIL Institute for Mathematical and Computational Engineering School of Engineering, Faculty of Mathematics Universidad Cat olica de Chile IMFD Chile Javier Marinkovic EMAIL Department of Computer Science Universidad de Chile IMFD Chile
Pseudocode No The paper uses mathematical definitions and logical derivations to describe the Transformer architecture and its Turing completeness proof, without presenting any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements regarding the release of source code for the methodology described, nor does it provide links to code repositories.
Open Datasets No The paper is a theoretical work focusing on the Turing completeness of Transformer networks and does not describe or use any specific datasets for empirical evaluation. Therefore, there is no information about the availability of open datasets.
Dataset Splits No As this is a theoretical paper providing a proof of Turing completeness, it does not involve empirical experiments with datasets, and therefore no dataset split information is provided.
Hardware Specification No This paper is theoretical in nature, providing a proof of Turing completeness for Transformer networks. It does not include any experimental evaluations or simulations that would require specific hardware, thus no hardware specifications are mentioned.
Software Dependencies No The paper is a theoretical study focusing on the computational power of Transformer networks. It does not describe any implemented systems or experiments that would necessitate listing specific software dependencies with version numbers.
Experiment Setup No The paper presents a theoretical proof of Turing completeness for Transformer networks and does not include any empirical experiments. Therefore, there are no details provided regarding an experimental setup or hyperparameters.