Symbolic Regression via Deep Reinforcement Learning Enhanced Genetic Programming Seeding
Authors: Terrell Mundhenk, Mikel Landajuela, Ruben Glatt, Claudio P Santiago, Daniel faissol, Brenden K Petersen
NeurIPS 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On a number of common benchmark tasks to recover underlying expressions from a dataset, our method recovers 65% more expressions than a recently published top-performing model using the same experimental setup. |
| Researcher Affiliation | Academia | T. Nathan Mundhenk EMAIL Mikel Landajuela EMAIL Ruben Glatt EMAIL Claudio P. Santiago EMAIL Daniel M. Faissol EMAIL Brenden K. Petersen EMAIL Computational Engineering Division Lawrence Livermore National Laboratory Livermore, CA 94550 |
| Pseudocode | Yes | Algorithm 1 Neural-guided genetic programming population seeding |
| Open Source Code | Yes | Source code is provided at www.github.com/brendenpetersen/deep-symbolic-optimization. |
| Open Datasets | Yes | We used two popular benchmark problem sets to compare our technique to other methods: Nguyen [Uy et al., 2014] and the R rationals [Krawiec and Pawlak, 2013]. Additionally, we introduce a new benchmark problem set with this work, which we call Livermore. |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test dataset splits (exact percentages, sample counts, or direct references to predefined splits) needed for reproduction in its main text. |
| Hardware Specification | Yes | Experiments were conducted on 36 core, 2.1 GHz, Intel Xeon E5-2695 workstations. |
| Software Dependencies | No | The paper mentions using 'DEAP' for the genetic programming component, but does not provide specific version numbers for DEAP or any other ancillary software components used in the experiments. |
| Experiment Setup | Yes | For all algorithms, we tuned hyperparameters using Nguyen-7 and R-3 . Hyperparameters are shown in Appendix Table 11. An important hyperparameter in our method is S, the number of GP generations to perform per RNN training step. Figure 2 shows a post-hoc analysis of how performance varies depending on how many GP steps we do between each RNN training step. The optimal number of steps is between 10 and 25. |