Elliptical Attention
Authors: Stefan Nielsen, Laziz Abdullaev, Rachel S.Y. Teo, Tan Nguyen
NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate the advantages of Elliptical Attention over the baseline dot-product attention and state-of-the-art attention methods on various practical tasks, including object classification, image segmentation, and language modeling across different data modalities. |
| Researcher Affiliation | Collaboration | Stefan K. Nielsen FPT Software AI Center Ha Noi, Vietnam EMAIL Laziz U. Abdullaev Department of Mathematics National University of Singapore Singapore 119077, Singapore EMAIL Rachel S.Y. Teo Department of Mathematics National University of Singapore Singapore 119077, Singapore EMAIL Tan M. Nguyen Department of Mathematics National University of Singapore Singapore 119077, Singapore EMAIL |
| Pseudocode | Yes | Pseudocode for the Elliptical Attention computation is provided in Appendix F.12. |
| Open Source Code | Yes | The code is publicly available at https://github.com/stefvk/Elliptical-Attention. |
| Open Datasets | Yes | We pretrain and evaluate our models on the Wiki Text-103 benchmark in comparison with the standard baseline Transformer [82], Performer [9], Transformer-MGK [52], Fourier Former [54], and the robust kernel density estimationbased Transformers including Transformer-SPKDE and Transformer-Mo M [23]. |
| Dataset Splits | Yes | The validation set and test sets consist of 60 articles with 218K and 246K tokens respectively. |
| Hardware Specification | Yes | All models are trained and evaluated on two NVIDIA A100 SXM4 40GB GPUs. |
| Software Dependencies | No | The paper mentions 'default Py Torch settings' but does not specify version numbers for PyTorch or any other software libraries or dependencies. |
| Experiment Setup | Yes | We trained with Adam using a starting learning rate of 0.00025 and cosine scheduling under default Py Torch settings. We used a batch size of 96 and trained for 120 epochs and 2000 warmup steps. The train and evaluation target lengths were set to 256. |