Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
Authors: George Wang, Jesse Hoogland, Stan van Wingerden, Zach Furman, Daniel Murfet
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By applying these refined LLCs (r LLCs) to individual components of a two-layer attentiononly transformer, we gain novel insights into the progressive differentiation and specialization of attention heads. Our methodology reveals how attention heads differentiate into distinct functional roles over the course of training, analyzes the types of data these heads specialize to process, and discovers a previously unidentified multigram circuit. These findings demonstrate that r LLCs provide a principled, quantitative toolkit for developmental interpretability, which aims to understand models through their evolution across the learning process. |
| Researcher Affiliation | Collaboration | George Wang Timaeus Jesse Hoogland Timaeus Stan van Wingerden Timaeus Zach Furman Timaeus Daniel Murfet School of Mathematics and Statistics The University of Melbourne |
| Pseudocode | No | The paper describes methodology using text and mathematical equations (e.g., Section 3, Appendix D.1.1), but no structured pseudocode or algorithm blocks are explicitly presented. |
| Open Source Code | Yes | Our LLC estimation procedure is documented in Appendix F.2, which lists the SGLD hyperparameters used for estimating the Local Learning Coefficient and references detailed resources for implementing LLC estimation. Stan van Wingerden, Jesse Hoogland, George Wang, and William Zhou. Devinterp. https: //github.com/timaeus-research/devinterp, 2024. |
| Open Datasets | Yes | trained on next-token prediction on a subset of the Pile (Gao et al., 2020; Xie et al., 2023). q = q Git Hub is a distribution of code (Code Parrot, 2023). Common training & evaluation corpora: The Pile (Gao et al., 2020; Xie et al., 2023), Tiny Stories (Eldan & Li, 2023), and Wikitext (Merity et al., 2016). Scientific domains for which we create datasets by filtering Arxiv abstracts by category (Massive Text Embedding Benchmark, 2024). Human languages for which datasets sampled from the CC-100 (Conneau et al., 2020; Wenzek et al., 2020). |
| Dataset Splits | No | The paper mentions using a 'subset of the DSIR-filtered Pile' for training and describes sampling for analysis (e.g., 'filter a subset of 100k samples from the training dataset'), but does not explicitly provide conventional train/test/validation splits (e.g., specific percentages or sample counts for each split) for reproducibility of model performance evaluation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU specifications, or cloud computing instances with their configurations. |
| Software Dependencies | No | The paper mentions using 'the Transformer Lens library (Nanda & Bloom, 2022)' and 'implementations provided by sklearn (Pedregosa et al., 2011) and tslearn (Tavenard et al., 2020)' but does not specify exact version numbers for these software dependencies to ensure reproducibility. |
| Experiment Setup | Yes | The model and training run considered in the main body is the same as in Hoogland et al. (2024). This was trained on a subset of the DSIR-filtered Pile (Gao et al., 2020; Xie et al., 2023) for a total of around 50, 000 steps, with a batch size of 100. For estimating the Local Learning Coefficient (LLC), we employed Stochastic Gradient Langevin Dynamics (SGLD) with the following hyperparameters: The SGLD step size η = 1e 3, The inverse temperature β = 30/n, The localization strength γ = 200, Number of independent chains: 4, Burn-in steps: 0, Draws per chain: 200. Model architecture: Context length: 1024 tokens, Residual stream dimension: 256, Number of attention heads per layer: 8, Layer normalization: Included, Positional embedding: Learnable Shortformer-style. |