reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Positional Attention: Expressivity and Learnability of Algorithmic Computation

Authors: Artur Back De Luca, George Giapitzakis, Shenghao Yang, Petar Veličković, Kimon Fountoulakis

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove that Transformers with positional attention (positional Transformers) maintain the same expressivity of parallel computational models... Finally, we empirically explore the out-of-distribution performance of positional Transformers and find that they perform well in tasks where their underlying algorithmic solution relies on positional information." and "7. Experiments. In this section, we evaluate the performance of positional and standard Transformers across various algorithmic tasks and regimes.
Researcher Affiliation	Collaboration	1David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada. 2Google Deep Mind, London, UK.
Pseudocode	Yes	Input: Data = (Data1, . . . , Datan) distributed across the memories of n machines, labeled in [n]. An oracle RCVn,Data : [R] [n] [n] (P([s]) \ { }). 1 For each round r = 1, . . . , R then 2 Each machine i simultaneously queries RCVn,Data with (r, i) as input and receives data from the machines and memory positions returned by the oracle. 3 The machines simultaneously perform local computations on the received data and write the results in their local memories.
Open Source Code	No	The paper does not contain an unambiguous statement of code release, a direct link to a code repository for the methodology described, or a clear statement that code is provided in supplementary material.
Open Datasets	No	The training data is sampled in the range [ 2, 2]. To ensure diversity in the data, for each input sample, we first select lower and upper bounds ̸̨̣l and ̸̨̣u uniformly in [ 2, 2], and then for each of the n elements of the input sample, we select its value uniformly in [̸̨̣l, ̸̨̣u]." and similar descriptions for other tasks. No concrete access information for datasets is provided.
Dataset Splits	Yes	We train models for each fixed length n {2, 4, 8, 16, 32} using 30,000 samples across all settings... We report the performance in-distribution and out-of-distribution using 1,000 test samples." (from Section D.1.2) and "We train models with 500,000 samples and ensure that all input lengths are equally represented. We then evaluate the in-distribution (ID) and out-of-distribution (OOD) loss across different scale factors c {1, 2, . . . , 10}. The losses reported are calculated using 3,000 samples for each scale." (from Section D.1.4).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions using 'Adam' for optimization but does not provide specific software dependencies like programming languages, libraries, or frameworks with version numbers.
Experiment Setup	Yes	Both models are trained end-to-end using the squared loss between the predicted and target vectors of size n, with no intermediate supervision. We train models with Adam, starting with a learning rate of 5 10 4 and a learning rate scheduler for a total of 2000 epochs." (from Section D.1.1). Additional details on layers, attention heads, MLP configurations, embedding dimensions, batch size, etc., are also provided across experimental setting sections.