Bilinear MLPs enable weight-based mechanistic interpretability
Authors: Michael Pearce, Thomas Dooms, Alice Rigg, Jose Oramas, Lee Sharkey
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Analyzing the spectra of bilinear MLP weights using eigendecomposition reveals interpretable low-rank structure across toy tasks, image classification, and language modeling. We consider models trained on the MNIST dataset of handwritten digits and the Fashion-MNIST dataset of clothing images. |
| Researcher Affiliation | Collaboration | Michael T. Pearce Independent pearcemt@ alumni.stanford.edu Thomas Dooms University of Antwerp thomas.dooms@ uantwerpen.be Alice Rigg Independent rigg.alice0@ gmail.com Jose Oramas University of Antwerp, sq IRL/IDLab EMAIL Lee Sharkey Apollo Research EMAIL |
| Pseudocode | No | The paper describes its methods and procedures in narrative text, without including any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | *Equal contribution Code at: https://github.com/tdooms/bilinear-decomposition |
| Open Datasets | Yes | We consider models trained on the MNIST dataset of handwritten digits and the Fashion-MNIST dataset of clothing images. ... (Eldan & Li, 2023) (see training details in Appendix G). ... The models used in the experiments shown in Figure 9 are trained of the Fine Web dataset (Penedo et al., 2024). |
| Dataset Splits | No | The paper uses well-known datasets like MNIST and Fashion-MNIST, but does not explicitly detail the training, validation, and test splits used for its experiments. For language models, it mentions 'context length 256' and 'context length 512' but no specific dataset splits (e.g., train/val/test percentages or counts). |
| Hardware Specification | Yes | We thank Core Weave for providing compute for the finetuning experiments. ... we fine-tuned Tiny Llama-1.1B, ... using a single A40 GPU. |
| Software Dependencies | No | The paper details experimental setups and hyperparameters in Appendix G but does not provide specific version numbers for software dependencies such as libraries, frameworks, or programming languages. |
| Experiment Setup | Yes | This section contains details about our architectures used and hyperparameters to help reproduce results. More information can be found in our code [currently not referenced for anonymity]. (followed by tables 1, 2, 3, and 4 listing specific hyperparameters like learning rate, batch size, optimizer, epochs, etc.) |