reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Transformer-VQ: Linear-Time Transformers via Vector Quantization

Authors: Lucas Dax Lingle

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our large-scale experiments, Transformer-VQ is shown highly competitive in quality, obtaining 0.99 bpb on Enwik8, 26.6 ppl on PG-19, and 3.16 bpb on Image Net64. In addition, the optimized implementation of Transformer-VQ is over 3x faster than a comparable quadratic-time transformer at sequence length 8k, is over 12x faster at 32k, and can scale to 131k with similar throughput.
Researcher Affiliation	Industry	Lucas D. Lingle Independent Researcher EMAIL
Pseudocode	Yes	See pseudocode in Appendix E. Code 1: Jax/Flax pseudocode for VQ-Attention.
Open Source Code	Yes	Code available: https://github.com/transformer-vq/transformer_vq
Open Datasets	Yes	Enwik8 is a byte-level language modeling dataset consisting of 100 million bytes of unprocessed Englishlanguage Wikipedia articles (Mahoney, 2011)... Per convention, it is split into train, validation, and test sets of 90 million, 5 million, and 5 million bytes, respectively (Child et al., 2019; Rae et al., 2020).
Dataset Splits	Yes	Per convention, it is split into train, validation, and test sets of 90 million, 5 million, and 5 million bytes, respectively (Child et al., 2019; Rae et al., 2020).
Hardware Specification	Yes	For training, we use TPU v3 pod slices (Jouppi et al., 2017). We benchmark on a TPU v3 with 8 cores, using a global batch size of 8 sequences.
Software Dependencies	No	Transformer-VQ is implemented in Jax (Bradbury et al., 2018) and Flax (Heek et al., 2023).
Experiment Setup	Yes	C.1 HYPERPARAMETERS Per-dataset hyperparameters are provided below. Table 10: Hyperparameters.