reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Progressive Self-Learning for Domain Adaptation on Symbolic Regression of Integer Sequences

Authors: Yaohui Zhu, Kaiming Sun, Zhengdong Luo, Lingfeng Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on OEIS datasets demonstrate that the proposed method surpasses current state-of-the-art methods in accuracy, and also discovers new formulas.
Researcher Affiliation	Academia	1College of Information Science and Technology, Beijing University of Chemical Technology 2School of Computer Science, Shenyang Aerospace University 3Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: The overview of our PSL. Input: Target sequence {St i}nt i=1. Output: Formula of target sequence {F t i }nt i=1.
Open Source Code	No	The paper does not contain an explicit statement about releasing code or a link to a code repository.
Open Datasets	Yes	Target Domain Sequence. The Online Encyclopedia of Integer Sequences (OEIS) (Sloane et al. 2018) is an online database with more than 360,000 integer sequences.
Dataset Splits	Yes	OEIS Easy25 is collected from first 10,000 sequences of OEIS with no less than 25 terms. OEIS Easy35 is collected from first 10,000 sequences with no less than 35 terms.
Hardware Specification	Yes	The experiments are performed on four Tesla T4 GPUs with 16G memory, and the time for one iteration is about 40 minutes.
Software Dependencies	No	The paper mentions using an "encoder-decoder transformer-based architecture (Vaswani et al. 2017)" and the "Adam optimizer", but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The Adam optimizer is utilized with a learning rate from 10 7 to 4 10 4 in the first 200 steps, and then decaying according to the reciprocal square root of the number of steps. During inference, k = 32 candidate formulas are generated for each target sequence through beam search, and the value of k is determined experimentally. In our experiments, the total number of iterations is 50. ... For the first iteration (Iter1), we train 10 epochs on randomly generated source data, which contain 25 million ordinary recurrence sequence-formulas pairs and 25 million linear recurrence sequence-formulas pairs. For other iterations (Iter2-Iter50), we train on the target domain data for 100 epochs, with batch size of 256.