Emergent Symbol-like Number Variables in Artificial Neural Networks

Authors: Satchel Grant, Noah Goodman, James Lloyd McClelland

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we interpret Neural Network (NN) solutions to sequence based number tasks through a variety of methods to understand how well we can interpret them through the lens of interpretable Symbolic Algorithms (SAs) precise algorithms describable by rules operating on typed, mutable variables. We use GRUs, LSTMs, and Transformers trained using Next Token Prediction (NTP) on tasks where the correct tokens depend on numeric information only latent in the task structure. We show through multiple causal and theoretical methods that we can interpret raw NN activity through the lens of simplified SAs when we frame the neural activity in terms of neural subspaces rather than individual neurons. Using Distributed Alignment Search (DAS), we find that, depending on network architecture, dimensionality, and task specifications, alignments with SA s can be very high, while other times they can be only approximate, or fail altogether. We extend our analytic toolkit to address the failure cases by expanding the DAS framework to a broader class of alignment functions that more flexibly capture NN activity in terms of interpretable variables from SAs, and we provide theoretic and empirical explorations of Linear Alignment Functions (LAFs) in contrast to the preexisting Orthogonal Alignment Functions (OAFs).
Researcher Affiliation Academia Satchel Grant EMAIL Departments of Psychology and Computer Science Stanford University Noah D. Goodman EMAIL Departments of Psychology and Computer Science Stanford University James L. Mc Clelland EMAIL Departments of Psychology and Computer Science Stanford University
Pseudocode Yes We include Algorithms 1, 2, and 3 in the supplement which show the pseudocode used to implement the Up-Down, Up-Up, and Ctx-Distr programs in simulations. Refer to Figure 1 for an illustration of the Up-Down strategy and the Ctx-Distr strategy that is observed in some transformers.
Open Source Code No The paper does not contain an explicit statement about releasing code or a link to a code repository for the described methodology.
Open Datasets No Each task we consider is defined by a Next-Token Prediction (NTP) task over sequences, as in the example shown in Figure 1. The goal of the task is to reproduce the same number of response tokens as demonstration tokens observed before the Trigger (T) token. Each sequence starts with a Beginning of Sequence (BOS) token and ends with an End of Sequence (EOS) token. Each sequence is generated by first uniformly sampling an object quantity from the inclusive range of 1 to 20 where 20 was chosen to match the human experiments of Pitt et al. (2022).
Dataset Splits Yes During model trainings, we hold out the object quantities 4, 9, 14, and 17 as a way to examine generalization. We chose 4, 9, 14, and 17 to semi-uniformly cover the space of possible numbers while including even, odd, and prime numbers and ensuring that training covered 3 examples at both ends of the training range. A trial is considered correct when all resp tokens and the EOS token are correctly predicted by the model after the trigger. ... We use 10,000 intervention samples for training and 1,000 samples for validation and testing.
Hardware Specification Yes All artificial neural network models were implemented and trained using Py Torch (Paszke et al., 2019) on Nvidia Titan X GPUs.
Software Dependencies Yes All artificial neural network models were implemented and trained using Py Torch (Paszke et al., 2019) on Nvidia Titan X GPUs. ... We orthogonalize the rotation matrix using Py Torch s orthogonal parameterization with default settings. ... We use a learning rate of 0.001 and an Adam optimizer. ... Each model used a two layer multi-layer perceptron (MLP) with GELU nonlinearities, with a hidden layer size of 4 times the hidden state dimensionality with 50% dropout on the hidden layer.
Experiment Setup Yes All models used an embedding and hidden state size of 128 dimensions. To make the token predictions, each model used a two layer multi-layer perceptron (MLP) with GELU nonlinearities, with a hidden layer size of 4 times the hidden state dimensionality with 50% dropout on the hidden layer. The GRU and LSTM model variants each consisted of a single recurrent cell followed by the output MLP. Unless otherwise stated, the transformer architecture consisted of two layers using Rotary positional encodings (Su et al., 2023). Each model variant used the same learning rate scheduler, which consisted of the original transformer (Vaswani et al., 2017) scheduling of warmup followed by decay. We used 100 warmup steps, a maximum learning rate of 0.0001 , a minimum of 1e-7, and a decay rate of 0.5. We used a batch size of 128, which caused each epoch to consist of 8 gradient update steps.