reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Symbolic Regression is NP-hard

Authors: Marco Virgolin, Solon P Pissis

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Indeed, we prove that there exist instances of the SR problem for which one cannot discover the best-possible mathematical expression in polynomial time unless P=NP. Id est, SR is an NP-hard problem. Our main contribution here was to prove that symbolic regression (SR), i.e., the problem of discovering an accurate model of data in the form of a mathematical expression, is in fact NP-hard.
Researcher Affiliation	Academia	Marco Virgolin EMAIL Centrum Wiskunde & Informatica, Amsterdam, the Netherlands Solon P. Pissis EMAIL Centrum Wiskunde & Informatica, Amsterdam, the Netherlands Vrije Universiteit, Amsterdam, the Netherlands
Pseudocode	No	The paper describes various algorithms from related works (e.g., genetic algorithms, deep learning-based algorithms, Monte-Carlo tree search) but does not provide pseudocode or algorithm blocks for its own methodology.
Open Source Code	No	The paper does not contain any explicit statements or links indicating the release of source code for the methodology described.
Open Datasets	No	The paper is theoretical and focuses on proving NP-hardness of symbolic regression; it does not present any experimental results based on specific datasets. While it mentions 'SRBench, a benchmarking platform for SR that includes more than 20 algorithms and more than 250 data sets' in Section 2.1, this refers to existing benchmarks for other algorithms, not datasets used for experiments in this paper.
Dataset Splits	No	The paper is theoretical and does not perform experiments using datasets, therefore, it does not specify any dataset splits.
Hardware Specification	No	The paper is theoretical and does not perform experiments, therefore, no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not perform experiments requiring specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on a proof of NP-hardness; it does not include an experimental setup with hyperparameters or training configurations.