FourierFormer: Transformer Meets Generalized Fourier Integral Theorem
Authors: Tan Nguyen, Minh Pham, Tam Nguyen, Khai Nguyen, Stanley Osher, Nhat Ho
NeurIPS 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we numerically justify the advantage of Fourier Former over the baseline dot-product transformer on two large-scale tasks: language modeling on Wiki Text-103 [46] (Section 4.1) and image classification on Image Net [22, 67] (Section 4.2), time series classification on the UEA benchmark [5] (Section 4.3), and reinforcement learning on the D4RL Benchmark [29] (Section 4.4), and the machine translation on the IWSLT 14 De-En [10] (Section 4.5).We theoretically prove that our proposed Fourier integral kernels can efficiently approximate any key and query distributions. |
| Researcher Affiliation | Academia | Tan M. Nguyen Department of Mathematics University of California, Los Angeles EMAIL Minh Pham Department of Mathematics University of California, Los Angeles EMAIL Tam Nguyen Department of ECE Rice University EMAIL Khai Nguyen Department of Statistics and Data Sciences University of Texas at Austin EMAIL Stanley J. Osher Department of Mathematics University of California, Los Angeles EMAIL Nhat Ho Department of Statistics and Data Sciences University of Texas at Austin EMAIL |
| Pseudocode | No | The paper describes the steps of self-attention but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our Py Torch code with documentation can be found at https://github.com/minhtannguyen/Fourier Former_Neur IPS. |
| Open Datasets | Yes | language modeling on Wiki Text-103 [46] (Section 4.1) and image classification on Image Net [22, 67] (Section 4.2), time series classification on the UEA benchmark [5] (Section 4.3), and reinforcement learning on the D4RL Benchmark [29] (Section 4.4), and the machine translation on the IWSLT 14 De-En [10] (Section 4.5). |
| Dataset Splits | Yes | We report the validation and test perplexity (PPL) of Fourier Former versus the baseline transformer with the dot-product attention in Table 1. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were provided in the paper. |
| Software Dependencies | No | The paper mentions 'Pytorch' but does not specify its version number or the version of CUDA used. |
| Experiment Setup | Yes | In all experiments, we made the constant R in Fourier attention (see equation (16)) to be a learnable scalar and set choose the function φ(x) = x4 (see Remark 2). |