State space models can express $n$-gram languages
Authors: Vinoth Nandakumar, Qiang Qu, Peng Mi, Tongliang Liu
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments with a small dataset generated from n-gram rules to show how our framework can be applied to SSMs and RNNs obtained through gradient-based optimization. |
| Researcher Affiliation | Academia | Vinoth Nandakumar EMAIL Sydney AI Centre, University of Sydney; Qiang Qu EMAIL Sydney AI Centre, University of Sydney; Peng Mi EMAIL Sydney AI Centre, University of Sydney; Tongliang Liu EMAIL Sydney AI Centre, University of Sydney |
| Pseudocode | No | The paper describes methods using mathematical equations and prose, but does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about the release of source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We generate data using the language L which is defined in Appendix B using a list of n-gram rules, and explained with a simplified diagram in Section 3.2. ... The table below has a complete list of all n-grams in P; note that |P| = 145. |
| Dataset Splits | No | We evaluate the model s accuracy on an unseen test set; each sentence in the test set is truncated to a length of 6, and the model s completion is evaluated by checking if the resulting sentence lies in our dataset. After replicating the experiment 5 times with different random seeds, we find the model consistently achieves near-perfect accuracies when using a training dataset with 40 sentences. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation. |
| Experiment Setup | Yes | The model is trained for 25 epochs with a fixed learning rate of 0.001, using stochastic gradient descent with the cross-entropy loss function. |