Generalizing Reasoning Problems to Longer Lengths

Authors: Changnan Xiao, Bing Liu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the empirical study, we introduce the Co T schemes for reasoning problems like arithmetic, parity, addition, multiplication, and division to train a Transformer to achieve LG for these problems. Our experiments verify (1) for a Co T scheme of a problem, if it is (n, r)-consistent, it is solvable for LG, and (2) for the same problem, one Co T scheme may not be solvable for LG, but another may. Fig. 1 shows the problem achieves 100% accuracy for all test sets as the problem is (1, 17)-consistent.
Researcher Affiliation Academia Bing Liu Department of Computer Science University of Illinois Chicago EMAIL
Pseudocode No The paper describes algorithms and methods but does not present any formal pseudocode blocks or algorithm sections. For instance, the proof sketch for Theorem 3.6 describes steps like "The 1st layer applies a local padding mask... The 1st feed-forward layer maps... The 2nd attention layer has no mask..." but these are descriptive, not pseudocode.
Open Source Code Yes The code of our system can be downloaded at https://openreview.net/forum?id=zpENPcQSj1.
Open Datasets No Every training or test set is generated independently. The training set and each test set are generated in the same way for each problem except that for the training set, we also need to generate its Co T steps for each problem instance based on individual Co T schemes, but for each test set, we do not. The paper describes a custom data generation process and does not refer to any pre-existing public datasets or provide public access links to its generated data.
Dataset Splits Yes We use 6 test sets to evaluate the model learned for each problem. The 5 columns marked LG Test i in Table 1 give the length ranges of the 5 test sets for each problem, where the maximum lengths of the test sets increase gradually. The first test set has the same length range as that of the training set and thus shares the Train Length column. Every test set consists of 1k questions (test problem instances), which are in sequence format with no Co T steps, e.g., 3 + 2 2. The training data for each task contains 12.8M Co T steps.
Hardware Specification Yes Each experiment is running on a machine with 8 CPU cores.
Software Dependencies No The optimizer is Adam and the learning rate is 0.0001. The paper mentions 'Adam' as an optimizer but does not specify versions for any programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes The optimizer is Adam and the learning rate is 0.0001. The training data for each task contains 12.8M Co T steps. Due to the complexity of multiplication and division, they are additionally trained on 25.6M Co T steps with learning rate 0.000005.