Bilinear MLPs enable weight-based mechanistic interpretability

Authors: Michael Pearce, Thomas Dooms, Alice Rigg, Jose Oramas, Lee Sharkey

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Analyzing the spectra of bilinear MLP weights using eigendecomposition reveals interpretable low-rank structure across toy tasks, image classification, and language modeling. We consider models trained on the MNIST dataset of handwritten digits and the Fashion-MNIST dataset of clothing images.
Researcher Affiliation Collaboration Michael T. Pearce Independent pearcemt@ alumni.stanford.edu Thomas Dooms University of Antwerp thomas.dooms@ uantwerpen.be Alice Rigg Independent rigg.alice0@ gmail.com Jose Oramas University of Antwerp, sq IRL/IDLab EMAIL Lee Sharkey Apollo Research EMAIL
Pseudocode No The paper describes its methods and procedures in narrative text, without including any structured pseudocode or algorithm blocks.
Open Source Code Yes *Equal contribution Code at: https://github.com/tdooms/bilinear-decomposition
Open Datasets Yes We consider models trained on the MNIST dataset of handwritten digits and the Fashion-MNIST dataset of clothing images. ... (Eldan & Li, 2023) (see training details in Appendix G). ... The models used in the experiments shown in Figure 9 are trained of the Fine Web dataset (Penedo et al., 2024).
Dataset Splits No The paper uses well-known datasets like MNIST and Fashion-MNIST, but does not explicitly detail the training, validation, and test splits used for its experiments. For language models, it mentions 'context length 256' and 'context length 512' but no specific dataset splits (e.g., train/val/test percentages or counts).
Hardware Specification Yes We thank Core Weave for providing compute for the finetuning experiments. ... we fine-tuned Tiny Llama-1.1B, ... using a single A40 GPU.
Software Dependencies No The paper details experimental setups and hyperparameters in Appendix G but does not provide specific version numbers for software dependencies such as libraries, frameworks, or programming languages.
Experiment Setup Yes This section contains details about our architectures used and hyperparameters to help reproduce results. More information can be found in our code [currently not referenced for anonymity]. (followed by tables 1, 2, 3, and 4 listing specific hyperparameters like learning rate, batch size, optimizer, epochs, etc.)