Autoformulation of Mathematical Optimization Models Using LLMs

Authors: Nicolás Astorga, Tennison Liu, Yuanzhang Xiao, Mihaela Van Der Schaar

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical analysis on linear and mixed-integer programming benchmarks demonstrates our method s effectiveness, with significant performance gains from both LLM-based value estimation and symbolic pruning techniques.
Researcher Affiliation Academia 1DAMTP, University of Cambridge, Cambridge, UK 2ECE, University of Hawaii at Manoa, Honolulu, USA. Correspondence to: Nicol as Astorga, Tennison Liu <EMAIL>.
Pseudocode No The paper only describes the MCTS algorithm steps in narrative text, without a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes We provide the code to reproduce our results at https://github. com/jumpynitro/Auto Formulator.1
Open Datasets Yes We evaluate our methods on four real-world benchmarks: NLP4OPT (Ramamonjison et al., 2023), a curated set of 244 linear programming problems (based on (Tang et al., 2024)); Industry OR (Tang et al., 2024), consisting of 100 problems spanning linear, integer, and mixed-integer programming at various difficulty levels; Complex OR (Xiao et al., 2023), with 37 real-world operations research problems from diverse domains; and MAMO (Huang et al., 2024b), using the more advanced Complex LP subset, which includes 211 problems.
Dataset Splits No The paper refers to existing benchmarks (NLP4OPT, Industry OR, Complex OR, MAMO) and reports accuracy, but does not explicitly describe any specific training/test/validation dataset splits used for its experiments or by the benchmarks themselves in the context of the reported results.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions several solvers (Gurobi, CVXPY, SMT solvers, TRCA, SLSQP, COBYLA, COBYQA, CLARABEL, ECOS, SCS, OSQP) but does not provide specific version numbers for these software dependencies as required for reproducibility.
Experiment Setup Yes We configure our method with H = 10 candidate formulations, I = 3 children retained after pruning and scoring, and T = 16 total rollouts.