reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Low-Rank Thinning

Authors: Annabelle Michael Carrell, Albert Gong, Abhishek Shetty, Raaz Dwivedi, Lester Mackey

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To gauge the practical effectiveness of Alg. 1, we recreate the benchmark Tokens-To-Token Vision Transformer (T2T-Vi T) and Big GAN image generation experiments of Zandieh et al. (2023). In the T2T-Vi T experiment, attention approximations are scored on their Image Net classification accuracy and computational expense when used as drop-in replacements for the two most expensive attention layers in a pretrained T2T-Vi T neural network (Yuan et al., 2021). In the Big GAN experiment, approximations are scored on their computational expense and two popular measures of image generation quality, the Frechet Inception Distance (FID, Heusel et al., 2017) and Inception Score (IS, Salimans et al., 2016). Using the exact implementations and settings provided by Zandieh et al. (2023), we benchmark our Py Torch implementation of Thinformer against exact attention and four leading attention approximations: Performer (Choromanski et al., 2021), Reformer (Kitaev et al., 2020), Scatter Brain (Chen et al., 2021), and KDEformer. In Tab. 3, we find that Thinformer (g = 2) provides the highest Top-1 accuracy on the Image Net 2012 validation set (Russakovsky et al., 2015), while running faster than all of the alternatives. In Tab. 4, Thinformer (g = 2) yields better FID and IS than all of the alternatives while running significantly faster than exact, KDEformer, Reformer, and Scatter Brain.
Researcher Affiliation	Collaboration	1University of Cambridge 2Cornell Tech 3MIT 4Microsoft Research New England. Correspondence to: Annabelle Carrell <EMAIL>, Albert Gong <EMAIL>, Abhishek Shetty <EMAIL>, Raaz Dwivedi <EMAIL>, Lester Mackey <EMAIL>.
Pseudocode	Yes	Algorithm 1: Thinformer
Open Source Code	Yes	We provide Py Torch code replicating this experiment at https://github.com/microsoft/ thinformer and supplementary experiment details in App. L.1. See https://github.com/microsoft/ khsgd for Py Torch code replicating this experiment and App. L.2 for supplementary experiment details. See https://github.com/ microsoft/deepctt for Py Torch code replicating this experiment and App. L.3 for supplementary experiment details.
Open Datasets	Yes	In Tab. 3, we find that Thinformer (g = 2) provides the highest Top-1 accuracy on the Image Net 2012 validation set (Russakovsky et al., 2015), while running faster than all of the alternatives. when we recreate the Home Mortgage Disclosure Act logistic regression experiment of Cooper et al. (2023) with a single worker (Fig. 1) To evaluate the practical utility of deep kernel CTT, we follow the Higgs mixture experiment of Domingo-Enrich et al. (2023, Sec. 5)
Dataset Splits	Yes	In Tab. 3, we find that Thinformer (g = 2) provides the highest Top-1 accuracy on the Image Net 2012 validation set (Russakovsky et al., 2015), while running faster than all of the alternatives.
Hardware Specification	Yes	The experiment of Tab. 3 was carried out using Python 3.12.9, Py Torch 2.8.0.dev20250407+cu128 (Paszke et al., 2019), and an Ubuntu 22.04.5 LTS server with an AMD EPYC 7V13 64-Core Processor, 220 GB RAM, and a single NVIDIA A100 GPU (80 GB memory, CUDA 12.8, driver version 570.124.04). The experiment of Tab. 4 was carried out using Python 3.12.9, Py Torch 2.6.0, and an Ubuntu 22.04.5 LTS server with an Intel(R) Xeon(R) Gold 5218 CPU Processor, 100 GB RAM, and a single NVIDIA A6000 GPU (48 GB memory, CUDA 12.1, driver version 530.30.02).
Software Dependencies	Yes	The experiment of Tab. 3 was carried out using Python 3.12.9, Py Torch 2.8.0.dev20250407+cu128 (Paszke et al., 2019), and an Ubuntu 22.04.5 LTS server with an AMD EPYC 7V13 64-Core Processor, 220 GB RAM, and a single NVIDIA A100 GPU (80 GB memory, CUDA 12.8, driver version 570.124.04). The experiment of Tab. 4 was carried out using Python 3.12.9, Py Torch 2.6.0, and an Ubuntu 22.04.5 LTS server with an Intel(R) Xeon(R) Gold 5218 CPU Processor, 100 GB RAM, and a single NVIDIA A6000 GPU (48 GB memory, CUDA 12.1, driver version 530.30.02).
Experiment Setup	Yes	Table L.1: Configurations for the attention approximation methods of Tab. 3. Table L.2: Configurations for the attention approximation methods of Tab. 4. optimization was carried out with a learning rate of α = 0.01, datapoints were loaded in batches of size 16, and stochastic gradients were reordered for each datapoint individually. Each test is run with replication count B = 100, nominal level α = 0.05, and failure probability δ = 0.5. The neural network ϕ was trained exactly as in Liu et al. (2020) (with learning rate 5 10 5 and batch size equal to the full training sample size), and runtime measurements exclude the time required to train ϕ.