Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing
Authors: Kento Nishi, Rahul Ramesh, Maya Okawa, Mikail Khona, Hidenori Tanaka, Ekdeep Singh Lubana
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through evaluations of edited models on this task, we show that KE inadvertently affects representations of entities beyond the targeted one, distorting relevant structures that allow a model to infer unseen knowledge about an entity. We further corroborate our findings in naturalistic settings with pre-trained Llama and Mamba models as well. |
| Researcher Affiliation | Collaboration | 1Harvard College 2CBS-NTT Program in Physics of Intelligence, Harvard University 3Physics and Informatics Lab, NTT Research Inc. 4Computer and Information Science, University of Pennsylvania 5Department of Physics, Massachusetts Institute of Technology. |
| Pseudocode | Yes | Algorithm 1: Generate a single sequence containing a collection of facts. |
| Open Source Code | Yes | Please find the source code for our experiments at github.com/Kento_Nishi/KE-ICML-2025. |
| Open Datasets | Yes | To quantify model performance before and after editing, we adopt the MMLU-Redux reasoning benchmark (Gema et al., 2024) with the Zero Eval prompting framework (Lin, 2024) to elicit chain-of-thought reasoning. |
| Dataset Splits | No | The paper defines concepts like 'edit sub-graph,' 'retain sub-graph,' and 'test sub-graph' for knowledge editing. It also mentions that certain facts are 'held out' for logical and compositional inference tasks. Additionally, for ROME, it states: 'The covariance matrix C is estimated by randomly sampling 10^5 inputs from the validation dataset.' However, specific percentages or absolute sample counts for the main training/validation/test splits of the synthetic data generated are not provided, nor are explicit split details for the MMLU-Redux benchmark used in the LLM experiments. |
| Hardware Specification | No | The paper states: 'For all experiments (unless stated otherwise), we use a 2-layer nano GPT Transformer (Karpathy, 2021).' It also mentions using 'pre-trained Llama and Mamba models.' However, no specific GPU models, CPU models, or other hardware specifications used for running the experiments are provided. |
| Software Dependencies | No | The paper mentions: 'Our Transformer model is a fork of the open-source nano GPT repository (https://github.com/karpathy/nano_GPT).' It also states: 'The value optimization is performed using the Adam optimizer, with hyperparameters lr = 10^-3 and weight decay = 10^-4.' While these refer to software components and tools, specific version numbers for these software dependencies (e.g., Python version, PyTorch version, nano GPT version) are not explicitly provided in the text. |
| Experiment Setup | Yes | We train a Transformer model using next-token prediction on the synthetic data generated from the above data generation process. For all experiments (unless stated otherwise), we use a 2-layer nano GPT Transformer (Karpathy, 2021). Batch size: 256 Context length: 16 Optimizer: Adam Learning rate: 6 × 10−4 Training epochs: 1.5 × 10^5 Decay iterations: 1.5 × 10^5 Momentum: β1 = 0.9, β2 = 0.95 Activation function: GeLU Block size: 16 Embedding dimensions: 24 Heads: 12 |