Dimension Reduction for Symbolic Regression

Authors: Paul Kahlmeyer, Markus Fischer, Joachim Giesen

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach in Section 5 on the Wikipedia eponymous equations data set (Guimer a et al. 2020) and on the Feynman symbolic regression data set1. Finally, we draw some conclusions in Section 6. [...] In a second experiment, we evaluate the effectiveness of combining our beam search with different state-of-the-art symbolic regression algorithms. [...] Results on the Feynman equations data set are shown in Table 4.
Researcher Affiliation Academia Paul Kahlmeyer, Markus Fischer, Joachim Giesen Friedrich Schiller University Jena EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the beam search process and uses an illustration in Figure 2, but it does not provide a structured pseudocode or algorithm block.
Open Source Code No The paper does not contain an explicit statement from the authors releasing their code or a direct link to a code repository for their methodology. It mentions third-party tools like SymPy and other symbolic regression algorithms, and provides a link to a dataset used, but not their own implementation's source code.
Open Datasets Yes For our experiments, we have used two sets of regression problems, namely, Wikipedia s list of 880 eponymous equations (Guimer a et al. 2020) and 114 formulas that were extracted from the Feynman lecture notes of physics (Udrescu and Tegmark 2020). We provide more details about both data sets in the full version of the paper. 1https://space.mit.edu/home/tegmark/aifeynman.html
Dataset Splits No The paper mentions using "hold out data" and sampling from functions, and that the Feynman dataset samples are "directly given by La Cava et al. (2021), who also add different levels of Gaussian noise". However, it does not specify exact training/test/validation splits (e.g., percentages, sample counts, or specific predefined split references) in the provided text.
Hardware Specification Yes All experiments were run on a computer with an Intel Xeon Gold 6226R 64-core processor, 128 GB of RAM, running Python 3.10.
Software Dependencies No While "Python 3.10" is mentioned, no other key libraries or software components are listed with specific version numbers. References to "Sym Py" or other symbolic regression algorithms are to the tools themselves, not their specific versions used by the authors as dependencies for their own implementation.
Experiment Setup No The paper states, "The beam search in the experiment uses beam size 1 and the CODEC functional dependence measure." and "To keep the search space small, we consider only expression DAGs with at most one intermediary node and one output node". However, it explicitly defers details for other settings: "An overview of the respective hyperparameters can be found in the full version of the paper," indicating that specific hyperparameter values for the symbolic regression algorithms are not provided in this extract.