reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans

Authors: Ashkan Shahbazi, Elaheh Akbari, Darian Salehi, Xinran Liu, Navid Naderializadeh, Soheil Kolouri

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments across multiple benchmark datasets, including image classification, point cloud classification, sentiment analysis, and neural machine translation, demonstrate that our enhanced attention regularization consistently improves performance across diverse applications.
Researcher Affiliation	Academia	1Department of Computer Science, Vanderbilt University, Nashville, TN, USA. 2Department of Computer Science, Duke University, Durham, NC, USA. 3Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA. 4Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN, USA.
Pseudocode	Yes	The overall pipeline of ESPFormer can be found in Algorithm 1. For completeness, we also report results from explicitly learning the slices and observe no significant benefits compared to axis-aligned slices. Algorithm 1 ESPFormer s Doubly-Stochastic Attention input Query matrix Q Rm N, Key matrix K Rm N, Value Matrix V Rd N, Soft Sort hyperparameter t, and inverse temperature hyperparameter τ. output Attention-weighted output matrix 1: Calculate the pairwise distance matrix: [C]ij = Q:i K:j 2. 2: for l = 1 to m do 3: Soft Sort the projected samples using (7): Al = Soft Sortt(Ql:), Bl = Soft Sortt(Kl:). 4: Calculate the transport plan Ul = 1 N AT l Bl 5: Calculate Dl = P ij[C]ij[Ul]ij 6: end for 7: Calculate the στ = softmax(D; τ) 8: Aggregate the plans from all slices G = Pm l=1 στ l Ul 9: Return: V G
Open Source Code	Yes	Our implementation code can be found at https: //github.com/dariansal/ESPFormer.
Open Datasets	Yes	The Model Net40 dataset (Wu et al., 2015) comprises 40 widely recognized 3D object categories... We next evaluate ESPFormer on the IMDB dataset (Maas et al., 2011) for sentiment analysis. ... We additionally evaluate ESPFormer on the Tweet Eval sentiment dataset (Barbieri et al., 2020)... trained on the IWSLT 14 German-to-English dataset (Cettolo et al., 2014)... conducted experiments on the Cats and Dogs dataset (Kaggle, 2013)... trained Transformer, Diff Transformer, Sinkformer, and ESPFormer on the MNIST dataset (Le Cun, Bengio, and Haffner, 1998).
Dataset Splits	Yes	To evaluate the generalizability of the models under limited data scenarios, we conducted experiments on the Cats and Dogs dataset (Kaggle, 2013) using varying fractions of the training data: 1%, 10%, 25%, and 100%.
Hardware Specification	No	The paper does not explicitly state the specific hardware used for running its experiments (e.g., GPU models, CPU models, or memory details).
Software Dependencies	No	The paper mentions software like 'fairseq' and the 'Adam optimizer' (Kingma & Ba, 2015), and implicitly 'numpy' through code snippets, but it does not provide specific version numbers for these components to ensure reproducibility. For example, it mentions 'fairseq sequence modeling toolkit (Ott et al., 2019)' but not its version.
Experiment Setup	Yes	The training procedure employs a batch size of 64 and utilizes the Adam optimizer (Kingma & Ba, 2015). The network is trained for 300 epochs, with an initial learning rate of 10 3, which is reduced by a factor of 10 after 200 epochs. Table 8. Hyperparameters used in training for Set Transformers on Model Net40 dataset.