Progressive Self-Learning for Domain Adaptation on Symbolic Regression of Integer Sequences

Authors: Yaohui Zhu, Kaiming Sun, Zhengdong Luo, Lingfeng Wang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on OEIS datasets demonstrate that the proposed method surpasses current state-of-the-art methods in accuracy, and also discovers new formulas.
Researcher Affiliation Academia 1College of Information Science and Technology, Beijing University of Chemical Technology 2School of Computer Science, Shenyang Aerospace University 3Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: The overview of our PSL. Input: Target sequence {St i}nt i=1. Output: Formula of target sequence {F t i }nt i=1.
Open Source Code No The paper does not contain an explicit statement about releasing code or a link to a code repository.
Open Datasets Yes Target Domain Sequence. The Online Encyclopedia of Integer Sequences (OEIS) (Sloane et al. 2018) is an online database with more than 360,000 integer sequences.
Dataset Splits Yes OEIS Easy25 is collected from first 10,000 sequences of OEIS with no less than 25 terms. OEIS Easy35 is collected from first 10,000 sequences with no less than 35 terms.
Hardware Specification Yes The experiments are performed on four Tesla T4 GPUs with 16G memory, and the time for one iteration is about 40 minutes.
Software Dependencies No The paper mentions using an "encoder-decoder transformer-based architecture (Vaswani et al. 2017)" and the "Adam optimizer", but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The Adam optimizer is utilized with a learning rate from 10 7 to 4 10 4 in the first 200 steps, and then decaying according to the reciprocal square root of the number of steps. During inference, k = 32 candidate formulas are generated for each target sequence through beam search, and the value of k is determined experimentally. In our experiments, the total number of iterations is 50. ... For the first iteration (Iter1), we train 10 epochs on randomly generated source data, which contain 25 million ordinary recurrence sequence-formulas pairs and 25 million linear recurrence sequence-formulas pairs. For other iterations (Iter2-Iter50), we train on the target domain data for 100 epochs, with batch size of 256.