reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement Learning

Authors: Chenglu Sun, Shuo Shen, Wenzhi Tao, Deyi Xue, Zixia Zhou

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that our method significantly outperforms several popular baselines on benchmarks with high-noise data. Furthermore, our method also can achieve sota performance on benchmarks with clean data, showcasing its robustness and efficacy in SR tasks.
Researcher Affiliation	Collaboration	Chenglu Sun1, Shuo Shen1, Wenzhi Tao1, Deyi Xue1, Zixia Zhou2* 1Cooperation Product Department, Interactive Entertainment Group, Tencent 2Stanford University EMAIL, EMAIL
Pseudocode	Yes	The pseudocode of NRSR is shown in Appendix.
Open Source Code	No	The paper does not explicitly state that the authors are releasing their source code for the methodology described. It mentions using an existing SR framework and a commercial software, but not providing their own implementation code.
Open Datasets	Yes	In this study, we employed the Nguyen SR benchmark suite (Uy et al. 2011) to assess our proposed method.
Dataset Splits	Yes	Datasets are generated using the ground truth and the input range, and are subsequently divided into three segments: one for training the NGM, one for calculating the fitness reward R(τ) of the expressions generated during the training process, and one for evaluating the best fit expression after each training iteration. The sample sizes for these three subsets are 20,000, 20, and 20, respectively.
Hardware Specification	No	The paper states: "Detailed specifications of the training settings can be found in Appendix." However, the main text does not contain any specific hardware details such as GPU/CPU models or memory specifications.
Software Dependencies	No	The paper mentions several algorithms and tools like RNN, PPO, GP, Eureqa, and Data Robot platform, but does not provide specific version numbers for any software libraries or dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The β is incremented from 0 to 0.06, there is a notable inflection in the performance. Specifically, the RR exhibits an initial ascent, followed by a descent. Concurrently, the EEN manifests a converse trend, initially presenting a decline, which then transitions into an ascent. This pattern implies a trade-off inherent in the MPE: it has the potential to improve exploratory behavior, yet an overly large β may heighten the computational expenditure of the algorithm. Settings Accuracy ... gating loss λ 0.1 ... gating loss λ 0.25 ... gating loss λ 0.5 Datasets are generated using the ground truth and the input range, and are subsequently divided into three segments: one for training the NGM, one for calculating the fitness reward R(τ) of the expressions generated during the training process, and one for evaluating the best fit expression after each training iteration. The sample sizes for these three subsets are 20,000, 20, and 20, respectively. Unless otherwise specified, all results reported in this study are the average of 100 replicated tests, each with different random seeds, for each benchmark expression.