reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Max-Margin Token Selection in Attention Mechanism

Authors: Davoud Ataee Tarzanagh, Yingcong Li, Xuechen Zhang, Samet Oymak

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we verify our theoretical findings via numerical experiments and provide insights. 4 Experiments
Researcher Affiliation	Academia	Davoud Ataee Tarzanagh University of Pennsylvania EMAIL Yingcong Li Xuechen Zhang University of California, Riverside EMAIL Samet Oymak University of Michigan UC Riverside EMAIL
Pseudocode	No	The paper describes algorithms and mathematical formulations but does not contain any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code	Yes	The code for experiments can be found at https://github.com/ucr-optml/max_margin_attention.
Open Datasets	Yes	To study softmax sparsity and the evolution of attention weights throughout training, we train a vision transformer (Vi T-base) model [23] from scratch, utilizing the CIFAR10 dataset [24] for 400 epochs with fixed learning rate 3 10 3. [24] Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The cifar-10 dataset. online: http://www. cs. toronto. edu/kriz/cifar. html, 55(5), 2014.
Dataset Splits	No	The paper mentions using the CIFAR-10 dataset but does not explicitly describe the training, validation, and test splits with specific percentages or sample counts.
Hardware Specification	No	The paper describes the experiments but does not specify any particular hardware used (e.g., GPU models, CPU types, or cloud compute instances).
Software Dependencies	No	The paper mentions using PyTorch for implementation but does not specify any software dependencies with version numbers.
Experiment Setup	Yes	During training, we use SGD optimizer with learning rate 0.1 and train the model for 1000 iterations. To study softmax sparsity and the evolution of attention weights throughout training, we train a vision transformer (Vi T-base) model [23] from scratch, utilizing the CIFAR10 dataset [24] for 400 epochs with fixed learning rate 3 10 3.