Autoformulation of Mathematical Optimization Models Using LLMs
Authors: Nicolás Astorga, Tennison Liu, Yuanzhang Xiao, Mihaela Van Der Schaar
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical analysis on linear and mixed-integer programming benchmarks demonstrates our method s effectiveness, with significant performance gains from both LLM-based value estimation and symbolic pruning techniques. |
| Researcher Affiliation | Academia | 1DAMTP, University of Cambridge, Cambridge, UK 2ECE, University of Hawaii at Manoa, Honolulu, USA. Correspondence to: Nicol as Astorga, Tennison Liu <EMAIL>. |
| Pseudocode | No | The paper only describes the MCTS algorithm steps in narrative text, without a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | We provide the code to reproduce our results at https://github. com/jumpynitro/Auto Formulator.1 |
| Open Datasets | Yes | We evaluate our methods on four real-world benchmarks: NLP4OPT (Ramamonjison et al., 2023), a curated set of 244 linear programming problems (based on (Tang et al., 2024)); Industry OR (Tang et al., 2024), consisting of 100 problems spanning linear, integer, and mixed-integer programming at various difficulty levels; Complex OR (Xiao et al., 2023), with 37 real-world operations research problems from diverse domains; and MAMO (Huang et al., 2024b), using the more advanced Complex LP subset, which includes 211 problems. |
| Dataset Splits | No | The paper refers to existing benchmarks (NLP4OPT, Industry OR, Complex OR, MAMO) and reports accuracy, but does not explicitly describe any specific training/test/validation dataset splits used for its experiments or by the benchmarks themselves in the context of the reported results. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions several solvers (Gurobi, CVXPY, SMT solvers, TRCA, SLSQP, COBYLA, COBYQA, CLARABEL, ECOS, SCS, OSQP) but does not provide specific version numbers for these software dependencies as required for reproducibility. |
| Experiment Setup | Yes | We configure our method with H = 10 candidate formulations, I = 3 children retained after pruning and scoring, and T = 16 total rollouts. |