Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing

Authors: Kento Nishi, Rahul Ramesh, Maya Okawa, Mikail Khona, Hidenori Tanaka, Ekdeep Singh Lubana

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through evaluations of edited models on this task, we show that KE inadvertently affects representations of entities beyond the targeted one, distorting relevant structures that allow a model to infer unseen knowledge about an entity. We further corroborate our findings in naturalistic settings with pre-trained Llama and Mamba models as well.
Researcher Affiliation Collaboration 1Harvard College 2CBS-NTT Program in Physics of Intelligence, Harvard University 3Physics and Informatics Lab, NTT Research Inc. 4Computer and Information Science, University of Pennsylvania 5Department of Physics, Massachusetts Institute of Technology.
Pseudocode Yes Algorithm 1: Generate a single sequence containing a collection of facts.
Open Source Code Yes Please find the source code for our experiments at github.com/Kento_Nishi/KE-ICML-2025.
Open Datasets Yes To quantify model performance before and after editing, we adopt the MMLU-Redux reasoning benchmark (Gema et al., 2024) with the Zero Eval prompting framework (Lin, 2024) to elicit chain-of-thought reasoning.
Dataset Splits No The paper defines concepts like 'edit sub-graph,' 'retain sub-graph,' and 'test sub-graph' for knowledge editing. It also mentions that certain facts are 'held out' for logical and compositional inference tasks. Additionally, for ROME, it states: 'The covariance matrix C is estimated by randomly sampling 10^5 inputs from the validation dataset.' However, specific percentages or absolute sample counts for the main training/validation/test splits of the synthetic data generated are not provided, nor are explicit split details for the MMLU-Redux benchmark used in the LLM experiments.
Hardware Specification No The paper states: 'For all experiments (unless stated otherwise), we use a 2-layer nano GPT Transformer (Karpathy, 2021).' It also mentions using 'pre-trained Llama and Mamba models.' However, no specific GPU models, CPU models, or other hardware specifications used for running the experiments are provided.
Software Dependencies No The paper mentions: 'Our Transformer model is a fork of the open-source nano GPT repository (https://github.com/karpathy/nano_GPT).' It also states: 'The value optimization is performed using the Adam optimizer, with hyperparameters lr = 10^-3 and weight decay = 10^-4.' While these refer to software components and tools, specific version numbers for these software dependencies (e.g., Python version, PyTorch version, nano GPT version) are not explicitly provided in the text.
Experiment Setup Yes We train a Transformer model using next-token prediction on the synthetic data generated from the above data generation process. For all experiments (unless stated otherwise), we use a 2-layer nano GPT Transformer (Karpathy, 2021). Batch size: 256 Context length: 16 Optimizer: Adam Learning rate: 6 × 10−4 Training epochs: 1.5 × 10^5 Decay iterations: 1.5 × 10^5 Momentum: β1 = 0.9, β2 = 0.95 Activation function: GeLU Block size: 16 Embedding dimensions: 24 Heads: 12