reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Ehrenfeucht-Haussler Rank and Chain of Thought

Authors: Pablo Barcelo, Alexander Kozachinskiy, Tomasz Steifer

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We present a novel characterization of rank, grounded in the well-known Transformer architecture. We show that the rank of a function f corresponds to the minimum number of Chain of Thought (CoT) steps required by a single-layer Transformer with hard attention to compute f. Based on this characterization we establish tight bounds on the number of CoT steps required for specific problems, showing that ℓ-fold function composition necessitates exactly ℓCoT steps. Furthermore, we analyze the problem of identifying the position of the k-th occurrence of 1 in a Boolean sequence, proving that it requires k CoT steps.
Researcher Affiliation	Academia	1Institute for Mathematical and Computational Engineering, Pontifical Catholic University of Chile 2Millennium Institute for Foundational Research on Data (IMFD Chile) 3National Center for Artificial Intelligence (CENIA Chile) 4Institute of Fundamental Technological Research, Polish Academy of Sciences. Correspondence to: Alexander Kozachinskiy <EMAIL>.
Pseudocode	No	The paper describes methods and proofs using mathematical notation and conceptual steps but does not include any explicitly labeled pseudocode or algorithm blocks. For example, it explains how a decision tree computes a function or how a decoder works in iterations, but these are textual descriptions rather than structured algorithmic presentations.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide links to any code repositories. The text focuses on theoretical characterizations and proofs.
Open Datasets	No	The paper deals with theoretical concepts like 't-Compn' and 'k-th Onen' functions and does not involve experiments on publicly available datasets. There are no mentions of specific datasets, links, or citations to data repositories.
Dataset Splits	No	The paper is theoretical and does not involve empirical evaluation on datasets, thus no dataset splits (training, validation, test) are mentioned.
Hardware Specification	No	The paper is theoretical in nature, focusing on mathematical characterizations and proofs. It does not describe any experimental setup that would require specific hardware, hence no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe any implementations or experimental setups that would require specific software libraries or versions. Therefore, no software dependencies are mentioned.
Experiment Setup	No	The paper is theoretical, presenting novel characterizations and proofs related to Boolean functions and Transformers. It does not describe any practical experiments, hyperparameters, training configurations, or system-level settings.