Language Models Are Implicitly Continuous

Authors: Samuele Marro, Davide Evangelista, X. Huang, Emanuele La Malfa, Michele Lombardi, Michael Wooldridge

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental By running experiments on state-of-the-art LLMs, we find that the language LLMs learn is implicitly continuous, as they are able to handle, with minor modifications, inputs that are both time continuous and spatial continuous. In particular, we formally show that the results obtained by extending pretrained LLMs to handle time continuous input strongly depend on a quantity, named duration, associated with each sentence. We also show in Section 4 that the semantics of this continuum significantly deviate from human intuition.
Researcher Affiliation Academia 1Department of Engineering Science University of Oxford Oxford, UK 2Department of Computer Science University of Bologna Bologna, Italy 3Department of Computer Science ETH Zurich Zurich, Switzerland 4Department of Computer Science University of Oxford Oxford, UK
Pseudocode No The paper provides mathematical derivations and descriptions of modifications to Transformer architecture (Appendix A, B.1), but does not include any clearly labeled pseudocode blocks or algorithms formatted like code.
Open Source Code Yes Our code is available at https://github.com/samuelemarro/continuous-llm-experiments.
Open Datasets Yes We quantitatively study this phenomenon by repeating this experiment on a dataset of 200 word counting tasks. We consider the sequential dataset from Lin et al. (2024), which contains 200 curated how-to tutorials split by step.
Dataset Splits No The paper mentions using a 'dataset of 200 word counting tasks' and 'the sequential dataset from Lin et al. (2024), which contains 200 curated how-to tutorials split by step'. However, it does not specify any explicit training, validation, or test splits for these datasets within the context of their own experiments. The experiments involve probing pre-trained LLMs, not training a new model requiring such splits.
Hardware Specification No The paper does not explicitly state the specific hardware used for running its experiments (e.g., GPU models, CPU types, or cloud computing instance specifications). It refers to 'state-of-the-art Large Language Models (LLMs), including Llama2, Llama3, Phi3, Gemma, Gemma2, and Mistral' but does not specify the hardware on which these models were evaluated.
Software Dependencies No In our experiments, we used Hugging Face, which natively supports 1. and 3. and can be easily adapted to support 2. While Hugging Face is mentioned as a tool used, specific version numbers for Hugging Face libraries or other critical software dependencies (e.g., PyTorch, TensorFlow, Python version) are not provided.
Experiment Setup Yes Experiment-specific parameters are reported in the respective subsections of Appendix C.4. CCTs can be implemented with little effort by starting with the implementation of a regular transformer and applying three modifications: 1. Modifying it so that it accepts arbitrary embeddings, rather than only tokens; 2. Modifying it so that positional indices can be floating points, instead of only integers; 3. Adding support for custom floating-point attention masks. For single-token continuity, we shrink the subset of considered tokens with a coefficient in the range [0.1,1]. We then interpolate (with 40 steps) between the sentence containing one object or the other.