reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Linear Transformer Topological Masking with Graph Random Features

Authors: Isaac Reid, Kumar Dubey, Deepali Jain, William Whitney, Amr Ahmed, Joshua Ainslie, Alex Bewley, Mithun George Jacob, Aranyak Mehta, David Rendleman, Connor Schenck, Richard E Turner, René Wagner, Adrian Weller, Krzysztof Choromanski

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate strong accuracy gains on image data, as well as for modelling the dynamics of massive point clouds (> 30k particles) in robotics applications where efficiency is essential. ... In this section, we test our algorithms for topological masking with GRFs. We consider data modalities with different graph topologies: images and point clouds. ... Table 1 shows the final test accuracies for Image Net (Deng et al., 2009), i Naturalist2021 (Horn et al., 2018) and Places365 (Zhou et al., 2018).
Researcher Affiliation	Collaboration	1University of Cambridge, 2Google Research, 3Google Deep Mind, 4Alan Turing Institute, 5Columbia University
Pseudocode	Yes	Alg. 1 presents our method. ... Algorithm 1 O(N) topologically-masked attention for general graphs
Open Source Code	No	Reproducibility statement: We have made every effort to ensure the work s reproducibility. The core algorithm is presented clearly in Alg. 1.
Open Datasets	Yes	Table 1 shows the final test accuracies for Image Net (Deng et al., 2009), i Naturalist2021 (Horn et al., 2018) and Places365 (Zhou et al., 2018). ... We train and evaluate on the Kinetics 400 benchmark (Kay et al., 2017).
Dataset Splits	No	The paper mentions standard datasets like Image Net, i Naturalist2021, Places365, and Kinetics 400 but does not explicitly state the dataset splits (e.g., percentages or sample counts) used for training, validation, or testing in the main text or supplementary tables.
Hardware Specification	No	For a hardware-agnostic comparison, we first compute the total number of FLOPs for evaluating (i) unmasked softmax, (ii) unmasked linear and (iii) GRFmasked linear attention for graphs of different sizes N. ... The paper does not explicitly mention specific hardware (e.g., GPU/CPU models, TPUs) used for running the experiments. It refers to a previous work (Whitney et al., 2024) for implementation details in one section, but does not specify hardware within this document.
Software Dependencies	No	The paper mentions using the Adam W optimiser (Loshchilov, 2017) and deep learning frameworks implicitly (e.g., PyTorch for ViT) but does not provide specific version numbers for any software libraries or dependencies.
Experiment Setup	Yes	Table 2: Architecture, hyperparameters and training details for Vi T experiments. Num. layers 12 Num. heads 12 Num. patches 16 Hidden size 768 MLP dim. 3072 Optimiser Adam Epochs 90 Base learning rate 3 10 3 Final learning rate 1 10 5 Learning rate schedule Linear warmup (104 steps), constant, cosine decay Batch size 4096 ... all models are trained with a batch size of 16; we use the Adam W optimiser (Loshchilov, 2017) with weight decay 10 3, clipping the gradient norm to 0.01; models are trained with 6 step rollouts, with losses computed on 128 sampled rays;