reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ZETA: Leveraging $Z$-order Curves for Efficient Top-$k$ Attention

Authors: Qiuhao Zeng, Jierui Huang, Peng Lu, Gezheng Xu, Boxing Chen, Charles Ling, Boyu Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that ZETA matches the performance of standard attention on the synthetic MULTI-QUERY ASSOCIATIVE RECALL task and outperforms attention and its variants on LONG RANGE ARENA and WIKITEXT-103 language modeling.
Researcher Affiliation	Collaboration	University of Western Ontario Universit e de Montr eal Mila Noah s Ark Lab Vector Institute
Pseudocode	Yes	The pseudo-code in Algorithm 1 outlines the ZETA Top-k Attention mechanism, which combines Z-order curve projections with chunk-based sorting to efficiently identify and retrieve the top-k nearest neighbors while maintaining causal constraints.
Open Source Code	No	The paper mentions "Our implementation is based on Triton." and discusses its optimization but does not provide an explicit statement or link for the public release of their source code for ZETA.
Open Datasets	Yes	We evaluate ZETA s performance on several aspects: ZETA s ability to solve the synthetic MULTI-QUERY ASSOCIATIVE RECALL task (Arora et al., 2024a), long sequence modeling ability on the LONG RANGE ARENA (LRA) benchmark and auto-regressive language modeling on WIKITEXT-103.
Dataset Splits	Yes	For each model, we adopt the same hyperparameter settings provided by the official LRA benchmark (Tay et al., 2021) to ensure a fair comparison.
Hardware Specification	No	The paper mentions the use of "GPUs" and "Triton" for implementation and efficiency benchmarking but does not specify any particular GPU models, CPU models, or other specific hardware configurations used for experiments.
Software Dependencies	No	The paper mentions "Py Torch (Paszke et al., 2019)" and "Triton" but does not provide specific version numbers for these software components or any other libraries.
Experiment Setup	Yes	The ZETA model configuration generally involves setting the number of chunks to values such as 4, 8, 16, 32 depending on the sequence length... The hidden dimension, d V , is typically set to 256 or 512 with 8 attention heads when working with LRA datasets. However, for larger and more complex datasets such as WIKITEXT-103, the hidden dimension is increased to d V = 768 with 12 attention heads... Additionally, the dimensions of keys and queries are kept significantly lower at d K = d Q = 3... In most of our experiments, we set k = 32...