How Do Transformers Learn Variable Binding in Symbolic Programs?
Authors: Yiwei Wu, Atticus Geiger, Raphaël Millière
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our analysis reveals a developmental trajectory with three distinct phases during training: (1) random prediction of numerical constants, (2) a shallow heuristic prioritizing early variable assignments, and (3) the emergence of a systematic mechanism for dereferencing assignment chains. Using causal interventions, we find that the model learns to exploit the residual stream as an addressable memory space, with specialized attention heads routing information across token positions. This mechanism allows the model to dynamically track variable bindings across layers, resulting in accurate dereferencing. Our results show how Transformer models can learn to implement systematic variable binding without explicit architectural support, bridging connectionist and symbolic approaches. |
| Researcher Affiliation | Academia | 1Pr(Ai)2R Group 2Macquarie University. Correspondence to: Yiwei Wu <EMAIL>, Rapha el Milli ere <EMAIL>. |
| Pseudocode | No | The paper describes the synthetic program structure using a grammar in Appendix B, but it does not provide pseudocode or algorithm blocks for its own methodology or training process. |
| Open Source Code | No | To facilitate transparent and reproducible interpretability research, we developed Variable Scope, an interactive web platform that allows researchers to explore and verify our experimental findings. The platform includes interactive visualizations of the program structure, training checkpoint evaluation, model developmental trajectory, causal intervention experiments, and subspace experiments. This platform builds on previous efforts to present experimental results interactively, such as the Distill Circuits Thread, while providing more granular tools to visualize and analyze the evolution of a neural network over the course of training (Cammarata et al., 2020). Through Variable Scope, we aim to establish a new standard for open and collaborative mechanistic interpretability research: variablescope.org. |
| Open Datasets | No | We generate a dataset of 500,000 programs. Our splits are: training (450,000 programs, 90%), validation (1,000 programs, 0.2%), and testing (49,000 programs, 9.8%). |
| Dataset Splits | Yes | We generate a dataset of 500,000 programs. Our splits are: training (450,000 programs, 90%), validation (1,000 programs, 0.2%), and testing (49,000 programs, 9.8%). |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, etc.) were mentioned in the paper. |
| Software Dependencies | No | The model is trained for 15 epochs using the Adam W optimizer with β1 = 0.95, β2 = 0.999, and a base learning rate of 1 10 4 (Loshchilov & Hutter, 2017) with a batch size of 64 programs. For regularization, we applied dropout with a rate of 0.1 and a weight decay coefficient of 1 10 4. |
| Experiment Setup | Yes | Our model is architecturally similar to GPT-2 (Radford et al., 2019), with 37.8M parameters. The model has 12 layers, each with 8 attention heads with a head dimension of 64 and a residual stream dimension of 512. We implement Layer Norm (Ba et al., 2016) before each attention and MLP block, use rotary positional embeddings (Ro PE) (Su et al., 2024), apply GELU activations between layers (Hendrycks & Gimpel, 2016), and a dropout rate of 0.1. We do not tie the input and output embedding weights. The model is trained for 15 epochs using the Adam W optimizer with β1 = 0.95, β2 = 0.999, and a base learning rate of 1 10 4 (Loshchilov & Hutter, 2017) with a batch size of 64 programs. The learning rate follows a linear decay schedule with warmup. Starting from zero, the learning rate linearly increases to 1 10 4 over 750 warmup steps, then linearly decays to zero over the remaining training period. For regularization, we applied dropout with a rate of 0.1 and a weight decay coefficient of 1 10 4. |