Progressive Self-Learning for Domain Adaptation on Symbolic Regression of Integer Sequences
Authors: Yaohui Zhu, Kaiming Sun, Zhengdong Luo, Lingfeng Wang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on OEIS datasets demonstrate that the proposed method surpasses current state-of-the-art methods in accuracy, and also discovers new formulas. |
| Researcher Affiliation | Academia | 1College of Information Science and Technology, Beijing University of Chemical Technology 2School of Computer Science, Shenyang Aerospace University 3Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: The overview of our PSL. Input: Target sequence {St i}nt i=1. Output: Formula of target sequence {F t i }nt i=1. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing code or a link to a code repository. |
| Open Datasets | Yes | Target Domain Sequence. The Online Encyclopedia of Integer Sequences (OEIS) (Sloane et al. 2018) is an online database with more than 360,000 integer sequences. |
| Dataset Splits | Yes | OEIS Easy25 is collected from first 10,000 sequences of OEIS with no less than 25 terms. OEIS Easy35 is collected from first 10,000 sequences with no less than 35 terms. |
| Hardware Specification | Yes | The experiments are performed on four Tesla T4 GPUs with 16G memory, and the time for one iteration is about 40 minutes. |
| Software Dependencies | No | The paper mentions using an "encoder-decoder transformer-based architecture (Vaswani et al. 2017)" and the "Adam optimizer", but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The Adam optimizer is utilized with a learning rate from 10 7 to 4 10 4 in the first 200 steps, and then decaying according to the reciprocal square root of the number of steps. During inference, k = 32 candidate formulas are generated for each target sequence through beam search, and the value of k is determined experimentally. In our experiments, the total number of iterations is 50. ... For the first iteration (Iter1), we train 10 epochs on randomly generated source data, which contain 25 million ordinary recurrence sequence-formulas pairs and 25 million linear recurrence sequence-formulas pairs. For other iterations (Iter2-Iter50), we train on the target domain data for 100 epochs, with batch size of 256. |