reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Primal-Dual Framework for Transformers and Neural Networks

Authors: Tan Minh Nguyen, Tam Minh Nguyen, Nhat Ho, Andrea L. Bertozzi, Richard Baraniuk, Stanley Osher

ICLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate the advantages of the Attention-BN and Attention-SH in reducing head redundancy, increasing the model s accuracy, and improving the model s efficiency in a variety of practical applications including image and time-series classification.
Researcher Affiliation	Academia	Tan M. Nguyen* Department of Mathematics University of California, Los Angeles EMAIL Tam Nguyen* Department of ECE Rice University EMAIL Nhat Ho Department of Statistics & Data Sciences University of Texas at Austin EMAIL Andrea L. Bertozzi Department of Mathematics University of California, Los Angeles EMAIL Richard G. Baraniuk Department of ECE Rice University EMAIL Stanley J. Osher Department of Mathematics University of California, Los Angeles EMAIL
Pseudocode	No	No clearly labeled pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Implementation available at https://github.com/thuml/Flowformer.
Open Datasets	Yes	We empirically demonstrate the advantages of our Attention-BN, Attention-SH, and their combination (Attention-BN+SH) over the baseline softmax attention on the UEA timeseries classification benchmark (Bagnall et al., 2018), the Long Range Arena benchmark (Tay et al., 2021), and the image classification task on the Imagenet dataset (Deng et al., 2009; Russakovsky et al., 2015).
Dataset Splits	Yes	The Image Net dataset (Deng et al., 2009; Russakovsky et al., 2015) consists of 1.28M training images and 50K validation images.
Hardware Specification	Yes	All of our experiments are conducted on a server with 4 NVIDIA A100 GPUs.
Software Dependencies	No	The paper refers to third-party implementations and their respective GitHub repositories but does not explicitly list the specific versions of programming languages or software libraries used in their own experimental setup.
Experiment Setup	Yes	In our experiments, we consider the constant β in Attention-BN/BN+SH and the different downsampling scales in Attention-SH/SH+BN as hyper-parameters to finetune. All of our experiments are conducted on a server with 4 NVIDIA A100 GPUs. In all models, the number of heads is 8, whereas the model dimension and number of transformer layers are varied. For Attention-SH/SH+BN, we downsample keys and values by the factor of 2, after every two successive heads.