Linearized Relative Positional Encoding
Authors: Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, Yiran Zhong
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that compared with existing methods, LRPE achieves state-of-the-art performance in language modeling, text classification, and image classification. Table 1: Quantitative results of the Roberta model fine-tuned on the GLUE dataset. |
| Researcher Affiliation | Collaboration | 1Shanghai AI Laboratory 2Open NLPLab 3Australian National University 4Northwestern Polytechnical University 5The University of Hong Kong |
| Pseudocode | Yes | D.2 Pseudocode In this section, we provide pseudocodes for LRPE in Python: |
| Open Source Code | No | No explicit statement about the release of source code or a repository link for the methodology described in the paper was found. The section D.2 provides pseudocode for illustration, but not the full implementation code. |
| Open Datasets | Yes | Dataset We use Wikitext-103 Merity et al. (2016), Books Zhu et al. (2015), and Wiki Book Wettig et al. (2022) datasets for NLP task evaluation and Image Net-1k Deng et al. (2009) for image classification evaluation. pretrained and then fine-tuned on several downstream tasks from the GLUE benchmark (Wang et al., 2018). conducted experiments on Long-Range Arena benchmark (Tay et al., 2020). |
| Dataset Splits | Yes | Dataset We use Wikitext-103 Merity et al. (2016), Books Zhu et al. (2015), and Wiki Book Wettig et al. (2022) datasets for NLP task evaluation and Image Net-1k Deng et al. (2009) for image classification evaluation. pretrained and then fine-tuned on several downstream tasks from the GLUE benchmark (Wang et al., 2018). conducted experiments on Long-Range Arena benchmark (Tay et al., 2020). |
| Hardware Specification | Yes | Our experiments are implemented in the Fairseq framework (Ott et al., 2019) and trained with V100 GPUs. |
| Software Dependencies | No | Our experiments are implemented in the Fairseq framework (Ott et al., 2019) and trained with V100 GPUs. (Mentions Fairseq, but no specific version number.) |
| Experiment Setup | Yes | Table 8: Detailed configurations used in our experiments. Total batch size means batch_per_gpu update_freq num_gpus. Attention dropout is only used for vanilla attention. ALM : autoregressive Language Model. BLM : bidirectional Language Model. IM : Image Modeling. and Table 9: Detailed configurations used in LRA experiments. BN stands for batch normalization. All methods use the same configuration, except for relative positional encodings. |