reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fundamental Limitations on Subquadratic Alternatives to Transformers

Authors: Josh Alman, Hantao Yu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we prove that any such approach cannot perform important tasks that Transformer is able to perform (assuming a popular conjecture from fine-grained complexity theory). We focus on document similarity tasks, where one is given as input many documents and would like to find a pair which is (approximately) the most similar. We prove that Transformer is able to perform this task, and we prove that this task cannot be performed in truly subquadratic time by any algorithm. Thus, any model which can be evaluated in subquadratic time whether because of subquadratic-time heuristics for attention, faster attention replacements like Mamba, or any other reason cannot perform this task.
Researcher Affiliation	Academia	Josh Alman Department of Computer Science Columbia University New York, NY 10027, USA EMAIL Hantao Yu Department of Computer Science Columbia University New York, NY 10027, USA EMAIL
Pseudocode	No	The paper describes theoretical proofs and mathematical constructions but does not include any explicitly labeled pseudocode or algorithm blocks. The methods are described using mathematical notation and natural language.
Open Source Code	No	The paper does not contain any statements about open-sourcing code, providing a repository link, or including code in supplementary materials.
Open Datasets	No	This is a theoretical paper focusing on computational complexity and proofs, not empirical evaluation using specific datasets. While it discusses 'document similarity tasks' and 'bag-of-words embedding' as concepts, it does not use or provide access to any specific datasets for experiments.
Dataset Splits	No	As this is a theoretical paper and does not conduct experiments with empirical data, there is no mention of dataset splits for training, validation, or testing.
Hardware Specification	No	This paper is theoretical and focuses on proofs and complexity analysis. It does not describe any experimental setup or the specific hardware used for computations.
Software Dependencies	No	This is a theoretical paper, and as such, it does not detail any software dependencies or versions used for conducting experiments or implementing models.
Experiment Setup	No	This is a theoretical paper presenting proofs and complexity results, not empirical experiments. Therefore, it does not provide details about an experimental setup, including hyperparameters or training configurations.