Lines of Thought in Large Language Models
Authors: Raphaƫl Sarfati, Toni Liu, Nicolas Boulle, Christopher Earls
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate which large-scale, ensemble properties can be inferred experimentally without concern for the microscopic details. Specifically, we are interested in the trajectories, or lines of thought (Lo T), that embedded tokens realize in the latent space when passing through successive transformer layers (Aubry et al., 2024). By splitting a large input text into Ntoken sequences, we study Lo T ensemble properties to shed light on the internal, average processes that characterize transformer transport. The results presented in Fig. 5 show that the simulated ensembles closely reproduce the ground truth of true trajectory distributions. |
| Researcher Affiliation | Academia | Rapha el Sarfati School of Civil and Environmental Engineering Cornell University, USA EMAIL, Toni J.B. Liu Department of Physics Cornell University, USA EMAIL, Nicolas Boull e Department of Mathematics Imperial College London, UK EMAIL, Christopher J. Earls Center for Applied Mathematics School of Civil and Environmental Engineering Cornell University, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Trajectory generation in transformer-based model |
| Open Source Code | Yes | Code for trajectory generation, visualization, and analysis is available on Github at https://github.com/rapsar/lines-of-thought. |
| Open Datasets | Yes | The main corpus in this study comes from Henry David Thoreau s Walden, obtained from the Gutenberg Project (Project Gutenberg, 2024). |
| Dataset Splits | Yes | We generate inputs by tokenizing (Wolf et al., 2020) a large text and then chopping it into pseudo-sentences , i.e., chunks of a fixed number of tokens Nk (see Algorithm 1). Unless otherwise noted, Nk = 50. These non-overlapping chunks are consistent in terms of token cardinality, and possess the structure of language, but have various meanings and endings (see Appendix A.1). The main corpus in this study comes from Henry David Thoreau s Walden... We typically use a set of Ns 3000 14000 pseudo-sentences. |
| Hardware Specification | No | The paper mentions various LLM models (GPT-2 medium, Llama 2 7B, Mistral 7B, Llama 3.2) but does not specify the hardware used to run or analyze these models (e.g., specific GPU or CPU models, memory, or cloud resources). |
| Software Dependencies | No | The paper mentions using specific LLMs like GPT-2, Llama 2, Mistral, and Llama 3.2, and references tokenizing with Wolf et al., 2020 (Huggingface's transformers), but it does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used for their analysis. |
| Experiment Setup | Yes | Language models. We rely primarily on the 355M-parameter ( medium ) version of the GPT2 model (Radford et al., 2019)... We later extend our analysis to the Llama 2 7B (Touvron et al., 2023), Mistral 7B v0.1 (Jiang et al., 2023), and small Llama 3.2 models (1B and 3B) (Meta AI, 2024). Input ensembles. We generate inputs by tokenizing (Wolf et al., 2020) a large text and then chopping it into pseudo-sentences , i.e., chunks of a fixed number of tokens Nk (see Algorithm 1). Unless otherwise noted, Nk = 50. The main corpus in this study comes from Henry David Thoreau s Walden... We typically use a set of Ns 3000 14000 pseudo-sentences. Trajectory collection. We form trajectories by collecting the successive vector outputs, within the latent space, after each transformer layer (hidden_states). |