reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Linearized Relative Positional Encoding

Authors: Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, Yiran Zhong

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that compared with existing methods, LRPE achieves state-of-the-art performance in language modeling, text classification, and image classification. Table 1: Quantitative results of the Roberta model fine-tuned on the GLUE dataset.
Researcher Affiliation	Collaboration	1Shanghai AI Laboratory 2Open NLPLab 3Australian National University 4Northwestern Polytechnical University 5The University of Hong Kong
Pseudocode	Yes	D.2 Pseudocode In this section, we provide pseudocodes for LRPE in Python:
Open Source Code	No	No explicit statement about the release of source code or a repository link for the methodology described in the paper was found. The section D.2 provides pseudocode for illustration, but not the full implementation code.
Open Datasets	Yes	Dataset We use Wikitext-103 Merity et al. (2016), Books Zhu et al. (2015), and Wiki Book Wettig et al. (2022) datasets for NLP task evaluation and Image Net-1k Deng et al. (2009) for image classification evaluation. pretrained and then fine-tuned on several downstream tasks from the GLUE benchmark (Wang et al., 2018). conducted experiments on Long-Range Arena benchmark (Tay et al., 2020).
Dataset Splits	Yes	Dataset We use Wikitext-103 Merity et al. (2016), Books Zhu et al. (2015), and Wiki Book Wettig et al. (2022) datasets for NLP task evaluation and Image Net-1k Deng et al. (2009) for image classification evaluation. pretrained and then fine-tuned on several downstream tasks from the GLUE benchmark (Wang et al., 2018). conducted experiments on Long-Range Arena benchmark (Tay et al., 2020).
Hardware Specification	Yes	Our experiments are implemented in the Fairseq framework (Ott et al., 2019) and trained with V100 GPUs.
Software Dependencies	No	Our experiments are implemented in the Fairseq framework (Ott et al., 2019) and trained with V100 GPUs. (Mentions Fairseq, but no specific version number.)
Experiment Setup	Yes	Table 8: Detailed configurations used in our experiments. Total batch size means batch_per_gpu update_freq num_gpus. Attention dropout is only used for vanilla attention. ALM : autoregressive Language Model. BLM : bidirectional Language Model. IM : Image Modeling. and Table 9: Detailed configurations used in LRA experiments. BN stands for batch normalization. All methods use the same configuration, except for relative positional encodings.