reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

How Do Transformers Learn Variable Binding in Symbolic Programs?

Authors: Yiwei Wu, Atticus Geiger, Raphaël Millière

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our analysis reveals a developmental trajectory with three distinct phases during training: (1) random prediction of numerical constants, (2) a shallow heuristic prioritizing early variable assignments, and (3) the emergence of a systematic mechanism for dereferencing assignment chains. Using causal interventions, we find that the model learns to exploit the residual stream as an addressable memory space, with specialized attention heads routing information across token positions. This mechanism allows the model to dynamically track variable bindings across layers, resulting in accurate dereferencing. Our results show how Transformer models can learn to implement systematic variable binding without explicit architectural support, bridging connectionist and symbolic approaches.
Researcher Affiliation	Academia	1Pr(Ai)2R Group 2Macquarie University. Correspondence to: Yiwei Wu <EMAIL>, Rapha el Milli ere <EMAIL>.
Pseudocode	No	The paper describes the synthetic program structure using a grammar in Appendix B, but it does not provide pseudocode or algorithm blocks for its own methodology or training process.
Open Source Code	No	To facilitate transparent and reproducible interpretability research, we developed Variable Scope, an interactive web platform that allows researchers to explore and verify our experimental findings. The platform includes interactive visualizations of the program structure, training checkpoint evaluation, model developmental trajectory, causal intervention experiments, and subspace experiments. This platform builds on previous efforts to present experimental results interactively, such as the Distill Circuits Thread, while providing more granular tools to visualize and analyze the evolution of a neural network over the course of training (Cammarata et al., 2020). Through Variable Scope, we aim to establish a new standard for open and collaborative mechanistic interpretability research: variablescope.org.
Open Datasets	No	We generate a dataset of 500,000 programs. Our splits are: training (450,000 programs, 90%), validation (1,000 programs, 0.2%), and testing (49,000 programs, 9.8%).
Dataset Splits	Yes	We generate a dataset of 500,000 programs. Our splits are: training (450,000 programs, 90%), validation (1,000 programs, 0.2%), and testing (49,000 programs, 9.8%).
Hardware Specification	No	No specific hardware details (GPU/CPU models, etc.) were mentioned in the paper.
Software Dependencies	No	The model is trained for 15 epochs using the Adam W optimizer with β1 = 0.95, β2 = 0.999, and a base learning rate of 1 10 4 (Loshchilov & Hutter, 2017) with a batch size of 64 programs. For regularization, we applied dropout with a rate of 0.1 and a weight decay coefficient of 1 10 4.
Experiment Setup	Yes	Our model is architecturally similar to GPT-2 (Radford et al., 2019), with 37.8M parameters. The model has 12 layers, each with 8 attention heads with a head dimension of 64 and a residual stream dimension of 512. We implement Layer Norm (Ba et al., 2016) before each attention and MLP block, use rotary positional embeddings (Ro PE) (Su et al., 2024), apply GELU activations between layers (Hendrycks & Gimpel, 2016), and a dropout rate of 0.1. We do not tie the input and output embedding weights. The model is trained for 15 epochs using the Adam W optimizer with β1 = 0.95, β2 = 0.999, and a base learning rate of 1 10 4 (Loshchilov & Hutter, 2017) with a batch size of 64 programs. The learning rate follows a linear decay schedule with warmup. Starting from zero, the learning rate linearly increases to 1 10 4 over 750 warmup steps, then linearly decays to zero over the remaining training period. For regularization, we applied dropout with a rate of 0.1 and a weight decay coefficient of 1 10 4.