Non-Asymptotic Length Generalization
Authors: Thomas Chen, Tengyu Ma, Zhiyuan Li
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper provides a more fine-grained analysis of length generalization, by providing a non-asymptotic guarantee on the minimum input length required to achieve length generalization, as a function of complexity of ground-truth. More specifically, we make the following contributions: In Section 2, we introduce the framework of non-asymptotic length generalization (Definition 2.6), which requires a non-asymptotic upper bound for the minimum input length that guarantees length generalization, given the complexity of ground-truth function under some given complexity measure C. The latter we call the length complexity under C (Definition 3.1). We show that the Minimum-Complexity Interpolator learning algorithm Amci, instantiated with complexity measure C, is the optimal with respect to the length complexity under C. As a concrete example, we show that Amci only needs inputs up to length 2c 2 to learn any ground-truth Deterministic Finite Automata (DFA) of c states. In Section 3, we show whether a hypothesis class admits non-asymptotic length generalization is equivalent to whether the problem of deciding equivalence of two finite descriptions of hypotheses is decidable. As a consequence, Context-Free Grammars (CFGs) do not admit non-asymptotic length-generalization, though they admit (asymptotic) length-generalization in the limit. In Section 5, we prove non-asymptotic length generalization for a subset of the transformer-related hypothesis class, C-RASP (Yang & Chiang, 2024). |
| Researcher Affiliation | Academia | 1Department of Computer Science, Stanford University, Stanford, USA 2Toyota Technological Institute at Chicago, Chicago, USA. Correspondence to: Thomas Chen <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Minimum-Complexity Interpolator (AR,C mci ) Hyperparameters: Complexity Measure C, encoding system R Input: Finite Training Dataset S {(x, y) | x {0, 1} , y {0, 1}}. Output: arg minp {0,1} : (x,y) DN(f ),y=R(p)(x) C(p) |
| Open Source Code | No | The paper does not contain any explicit statement about releasing code or provide a link to a code repository. The acknowledgment of the Stanford HAI Google Cloud Credits Program doesn't constitute a code release. |
| Open Datasets | No | The paper primarily deals with theoretical frameworks and complexity measures for function classes like DFAs, CFGs, and C-RASP. It does not describe experiments using specific datasets, thus no information on open datasets is provided. |
| Dataset Splits | No | The paper is theoretical and focuses on length generalization in an idealized setting, primarily using mathematical proofs and definitions of function classes. It does not conduct experiments on datasets that would require training/test/validation splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific experiments that would require detailing hardware specifications. The acknowledgment of "Stanford HAI Google Cloud Credits Program" does not specify actual hardware used for experimental runs. |
| Software Dependencies | No | The paper focuses on theoretical aspects of length generalization and does not describe any specific software implementations or dependencies with version numbers for experimental reproducibility. |
| Experiment Setup | No | The paper is purely theoretical, providing provable guarantees and formalizing frameworks. It does not describe any experimental setups, hyperparameters, or training settings for practical implementations. |