Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation

Authors: Itamar Zimerman, ameen ali ali, Lior Wolf

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that our attention matrices and attribution method outperform an alternative and a more limited formulation that was recently proposed for Mamba. For the other architectures for which our method is the first to provide such a view, our method is effective and competitive in the relevant metrics compared to the results obtained by state-of-the-art Transformer explainability methods. Our code is publicly available.
Researcher Affiliation Academia Itamar Zimerman Ameen Ali Lior Wolf The Blavatnik School of Computer Science, Tel Aviv University EMAIL, EMAIL
Pseudocode No The paper provides mathematical formulations and descriptions of methods but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Our code is publicly available. https://github.com/Itamarzimm/Unified Implicit Attn Repr
Open Datasets Yes In the zero-shot setting, we utilized pre-trained Mamba-based LLMs with sizes of 1.3B and 2.8B on the ARC-E dataset (Clark et al., 2018), which evaluates the reasoning abilities of LLMs. ... We evaluated our proposed Mamba s implicit attention mechanism by comparing its generated foreground segmentation maps against ground truth from the Image Net Segmentation dataset (Guillaumin et al., 2014). ... To ensure a fair comparison, we fine-tune both Dei T-Small and Vi M-Small models under identical conditions over the Pascal-voc 2012 (Everingham et al., 2010) dataset, excluding multiscale training, inference, or any other modifications.
Dataset Splits Yes The perturbations results for vision models are summarized in Table 1 for various explanation methods under both positive and negative perturbation scenarios on the Image Net validation set. ... We evaluated our proposed Mamba s implicit attention mechanism by comparing its generated foreground segmentation maps against ground truth from the Image Net Segmentation dataset (Guillaumin et al., 2014). ... To ensure a fair comparison, we fine-tune both Dei T-Small and Vi M-Small models under identical conditions over the Pascal-voc 2012 (Everingham et al., 2010) dataset
Hardware Specification No The paper mentions that Mamba is "parallelized on GPUs" but does not specify the exact hardware (GPU models, CPU models, memory, etc.) used for the experiments conducted in this study.
Software Dependencies No All of our experiments are conducted using the Py Torch framework on public datasets. The paper mentions 'Py Torch framework' but does not specify its version or any other software dependencies with their respective version numbers.
Experiment Setup No In the zero-shot setting, we utilized pre-trained Mamba-based LLMs with sizes of 1.3B and 2.8B on the ARC-E dataset (Clark et al., 2018), which evaluates the reasoning abilities of LLMs. ... To ensure a fair comparison, we fine-tune both Dei T-Small and Vi M-Small models under identical conditions over the Pascal-voc 2012 (Everingham et al., 2010) dataset, excluding multiscale training, inference, or any other modifications. The paper does not provide specific hyperparameters such as learning rates, batch sizes, number of epochs, or optimizer settings for its experiments.