LLM-SR: Scientific Equation Discovery via Programming with Large Language Models

Authors: Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, Chandan Reddy

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate LLM-SR on four benchmark problems across diverse scientific domains (e.g., physics, biology), which we carefully designed to simulate the discovery process and prevent LLM recitation. Our results demonstrate that LLM-SR discovers physically accurate equations that significantly outperform state-of-the-art symbolic regression baselines, particularly in out-of-domain test settings.
Researcher Affiliation Collaboration Parshin Shojaee1 Kazem Meidani2 Shashank Gupta3 Amir Barati Farimani2 Chandan K. Reddy1 1Virginia Tech 2Carnegie Mellon University 3Allen Institute for AI
Pseudocode Yes Algorithm 1: LLM-SR Input :LLM πθ, dataset D, problem T , T iterations, k in-context examples, b samples per prompt # Initialize population P0 Init Pop() f , s null, for t 1 to T 1 do # Sample examples from buffer E {ej}k j=1, ej = Sample Exp(Pt 1) # Prompt with new examples p Make Few Shot Prompt(E) # Sample from LLM Ft {fj}b j=1, fj πθ( |p) # Evaluation and population update for f Ft do s Score T (f, D) if s > s then f , s f, s Pt Pt 1 {(f, s)} end end end Output :f , s
Open Source Code Yes Code and data are available: https://github.com/deep-symbolic-mathematics/LLM-SR
Open Datasets Yes The datasets used in this study include both publicly available and newly generated data. The material stress behavior analysis dataset (stress-strain) is publicly available under the CC BY 4.0 license and can be accessed at https://data.mendeley.com/datasets/rd6jm9tyb6/1. The remaining datasets (Oscillation 1, Oscillation 2, and E. coli Growth) were generated for this work and are released under the MIT License as part of the LLM-SR Git Hub repository: https://github.com/deep-symbolic-mathematics/LLM-SR
Dataset Splits Yes To effectively evaluate the generalization capability of predicted equations, we employ a strategic data partitioning scheme. The simulation data is divided into three sets based on the trajectory time: (1) Training set, (2) In-domain validation set, and (3) Out-of-domain validation set. Specifically, we utilize the time interval T = [0, 20) to evaluate the out-of-domain generalization of the discovered equations.
Hardware Specification Yes Our experiments employ either Mixtral-8x7B (using 4 NVIDIA RTX 8000 GPUs with 48GB memory each) or GPT-3.5-turbo (via Open AI API) as the language model backbone.
Software Dependencies No The paper mentions using Python, the `scipy` library for `numpy+BFGS` optimization, and `PyTorch` for `torch+Adam` optimization, as well as `Mixtral-8x7B` and `GPT-3.5-turbo` as LLM backbones. However, it does not provide specific version numbers for Python, `scipy`, or `PyTorch`.
Experiment Setup Yes In LLM-SR experiments, each iteration samples b = 4 equation skeletons per prompt with temperature τ = 0.8, optimizes parameters via numpy+BFGS or torch+Adam (with 30 seconds timeout), and uses k = 2 in-context examples from the experience buffer for refinement. To control the length and the complexity of the generated equations and prevent overparameterization, we set the maximum number of parameters (length of params vector) as 10 in all experiments. Evaluation is constrained by time and memory limits set at T = 30 seconds and M = 2GB, respectively.