reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RAG-SR: Retrieval-Augmented Generation for Neural Symbolic Regression

Authors: Hengzhe Zhang, Qi Chen, Bing XUE, Wolfgang Banzhaf, Mengjie Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that our framework achieves state-of-the-art accuracy across 25 regression algorithms and 120 regression tasks 1.
Researcher Affiliation	Academia	Hengzhe Zhang1,3, Qi Chen1,3, Bing Xue1,3, Wolfgang Banzhaf2, Mengjie Zhang1,3 1School of Engineering and Computer Science, Victoria University of Wellington, New Zealand 2Department of Computer Science and Engineering, Michigan State University, USA 3Centre for Data Science and Artificial Intelligence, Victoria University of Wellington, New Zealand
Pseudocode	Yes	Algorithm 1 Semantic Descent 1: Input: Features Φ = {ϕ1, . . . , ϕm}, semantics library L, neural network model N, current semantics Φ(X), target Y , neural generation probability Pneural 2: Output: Updated features Φ 3: O Random permutation of {1, 2, . . . , m} Shuffle tree indices
Open Source Code	Yes	Emails: EMAIL, EMAIL. 1Source Code: https://github.com/hengzhe-zhang/RAG-SR
Open Datasets	Yes	In this study, we primarily focus on 120 black-box datasets from the PMLB benchmark (Olson et al., 2017), which are particularly challenging for pre-training methods (Kamienny et al., 2022) due to the potential absence of simple symbolic expressions to model these datasets. The results on the 119 Feynman and 14 Strogatz datasets are presented in Appendix L.2.
Dataset Splits	Yes	The evaluation follows the established procedures of state-of-the-art symbolic regression benchmarks (La Cava et al., 2021). Specifically, each dataset is split into training and testing sets with a 75:25 ratio, and experiments are repeated 10 times for robustness.
Hardware Specification	No	This discrepancy is partly due to the fact that, in the current implementation, all neural networks in RAG-SR are trained on a CPU due to limited computational resources. Training the neural networks on a GPU could potentially reduce the computational time of RAG-SR.
Software Dependencies	No	The paper mentions 'Sym Py-compatible expression' in the context of model complexity and 'Adam optimizer' and 'cosine annealing with warm restarts' without specific version numbers for the software libraries used (e.g., Python, PyTorch, TensorFlow, scikit-learn). Therefore, specific software dependencies with version numbers are not provided.
Experiment Setup	Yes	Parameter Settings: For the neural network, the dropout rate is set to 0.1. The MLP consists of 3 layers, while both the encoder and decoder Transformers have 1 layer each. The hidden layer size is set to 64 neurons. A learning rate of 0.01 and a batch size of 64 are used. Early stopping with a patience of 5 epochs is employed to prevent overfitting. The weight of contrastive loss λ is set to 0.05. For GP, we follow conventional parameter settings: a population size of 200 and a maximum of 100 generations. Each solution consists of 10 trees, representing 10 features. The probability of using neural generation, Pneural, is set to 0.1.