reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Dimension Reduction for Symbolic Regression

Authors: Paul Kahlmeyer, Markus Fischer, Joachim Giesen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach in Section 5 on the Wikipedia eponymous equations data set (Guimer a et al. 2020) and on the Feynman symbolic regression data set1. Finally, we draw some conclusions in Section 6. [...] In a second experiment, we evaluate the effectiveness of combining our beam search with different state-of-the-art symbolic regression algorithms. [...] Results on the Feynman equations data set are shown in Table 4.
Researcher Affiliation	Academia	Paul Kahlmeyer, Markus Fischer, Joachim Giesen Friedrich Schiller University Jena EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the beam search process and uses an illustration in Figure 2, but it does not provide a structured pseudocode or algorithm block.
Open Source Code	No	The paper does not contain an explicit statement from the authors releasing their code or a direct link to a code repository for their methodology. It mentions third-party tools like SymPy and other symbolic regression algorithms, and provides a link to a dataset used, but not their own implementation's source code.
Open Datasets	Yes	For our experiments, we have used two sets of regression problems, namely, Wikipedia s list of 880 eponymous equations (Guimer a et al. 2020) and 114 formulas that were extracted from the Feynman lecture notes of physics (Udrescu and Tegmark 2020). We provide more details about both data sets in the full version of the paper. 1https://space.mit.edu/home/tegmark/aifeynman.html
Dataset Splits	No	The paper mentions using "hold out data" and sampling from functions, and that the Feynman dataset samples are "directly given by La Cava et al. (2021), who also add different levels of Gaussian noise". However, it does not specify exact training/test/validation splits (e.g., percentages, sample counts, or specific predefined split references) in the provided text.
Hardware Specification	Yes	All experiments were run on a computer with an Intel Xeon Gold 6226R 64-core processor, 128 GB of RAM, running Python 3.10.
Software Dependencies	No	While "Python 3.10" is mentioned, no other key libraries or software components are listed with specific version numbers. References to "Sym Py" or other symbolic regression algorithms are to the tools themselves, not their specific versions used by the authors as dependencies for their own implementation.
Experiment Setup	No	The paper states, "The beam search in the experiment uses beam size 1 and the CODEC functional dependence measure." and "To keep the search space small, we consider only expression DAGs with at most one intermediary node and one output node". However, it explicitly defers details for other settings: "An overview of the respective hyperparameters can be found in the full version of the paper," indicating that specific hyperparameter values for the symbolic regression algorithms are not provided in this extract.