Non-Asymptotic Length Generalization

Authors: Thomas Chen, Tengyu Ma, Zhiyuan Li

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper provides a more fine-grained analysis of length generalization, by providing a non-asymptotic guarantee on the minimum input length required to achieve length generalization, as a function of complexity of ground-truth. More specifically, we make the following contributions: In Section 2, we introduce the framework of non-asymptotic length generalization (Definition 2.6), which requires a non-asymptotic upper bound for the minimum input length that guarantees length generalization, given the complexity of ground-truth function under some given complexity measure C. The latter we call the length complexity under C (Definition 3.1). We show that the Minimum-Complexity Interpolator learning algorithm Amci, instantiated with complexity measure C, is the optimal with respect to the length complexity under C. As a concrete example, we show that Amci only needs inputs up to length 2c 2 to learn any ground-truth Deterministic Finite Automata (DFA) of c states. In Section 3, we show whether a hypothesis class admits non-asymptotic length generalization is equivalent to whether the problem of deciding equivalence of two finite descriptions of hypotheses is decidable. As a consequence, Context-Free Grammars (CFGs) do not admit non-asymptotic length-generalization, though they admit (asymptotic) length-generalization in the limit. In Section 5, we prove non-asymptotic length generalization for a subset of the transformer-related hypothesis class, C-RASP (Yang & Chiang, 2024).
Researcher Affiliation Academia 1Department of Computer Science, Stanford University, Stanford, USA 2Toyota Technological Institute at Chicago, Chicago, USA. Correspondence to: Thomas Chen <EMAIL>.
Pseudocode Yes Algorithm 1 Minimum-Complexity Interpolator (AR,C mci ) Hyperparameters: Complexity Measure C, encoding system R Input: Finite Training Dataset S {(x, y) | x {0, 1} , y {0, 1}}. Output: arg minp {0,1} : (x,y) DN(f ),y=R(p)(x) C(p)
Open Source Code No The paper does not contain any explicit statement about releasing code or provide a link to a code repository. The acknowledgment of the Stanford HAI Google Cloud Credits Program doesn't constitute a code release.
Open Datasets No The paper primarily deals with theoretical frameworks and complexity measures for function classes like DFAs, CFGs, and C-RASP. It does not describe experiments using specific datasets, thus no information on open datasets is provided.
Dataset Splits No The paper is theoretical and focuses on length generalization in an idealized setting, primarily using mathematical proofs and definitions of function classes. It does not conduct experiments on datasets that would require training/test/validation splits.
Hardware Specification No The paper is theoretical and does not describe any specific experiments that would require detailing hardware specifications. The acknowledgment of "Stanford HAI Google Cloud Credits Program" does not specify actual hardware used for experimental runs.
Software Dependencies No The paper focuses on theoretical aspects of length generalization and does not describe any specific software implementations or dependencies with version numbers for experimental reproducibility.
Experiment Setup No The paper is purely theoretical, providing provable guarantees and formalizing frameworks. It does not describe any experimental setups, hyperparameters, or training settings for practical implementations.