Guiding Large Language Models in Modeling Optimization Problems via Question Partitioning
Authors: Xiaotian Pan, Junhao Fang, Feng Wu, Sijia Zhang, Yi-Xiang Hu, Shaoang Li, Xiang-Yang Li
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments demonstrate that our method improves performance on the common benchmark dataset NLP4LP, achieving an accuracy of 62.3% and a code executability rate of 86.8% when tested on GPT-4. Additionally, we demonstrate the effectiveness of our Pa MOP in handling large real-world problems. Experiments on the NLP4LP dataset demonstrate that Pa MOP achieves an accuracy of 62.3% and a code executability rate of 86.8%, both outperforming existing methods. Ablation studies further confirm the importance of using the partition tree in enhancing model performance. |
| Researcher Affiliation | Academia | School of Computer Science and Technology, University of Science and Technology of China EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods and processes in natural language and mathematical formulations, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code. |
| Open Source Code | No | The paper does not contain an unambiguous statement that the authors are releasing their code, nor does it provide a direct link to a source-code repository. |
| Open Datasets | Yes | The NLP4LP dataset [Ahmadi Teshnizi et al., 2024] is collected from optimization textbooks and manuals. It includes problems such as network flow, scheduling, combinatorial optimization, and more. In total, it contains 54 LP problems and 13 MILP problems. |
| Dataset Splits | No | The paper mentions using the NLP4LP dataset and a custom set of real-world problems but does not specify exact training, validation, or test splits for these datasets. It notes, "Each example contains a description of the problem, the classification of the problem, the dimensions of the input data, and the data file," and "To adapt the dataset to the AMPL format, we have preprocessed the dataset's data.json into a data.dat version," but no explicit split information is provided. |
| Hardware Specification | No | The paper mentions testing the system using GPT-4 but does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running their experiments or computations. |
| Software Dependencies | No | The paper mentions using AMPL and Gurobi but does not specify their version numbers. For example, "We use AMPL [Fourer et al., 1987] for modeling, as it separates the model and data files. Unlike humans, LLMs treat mathematical formulas, modeling languages, and programming languages as different languages, so we directly generate code in the modeling language instead of formulas." and "we use AMPL to call Gurobi to solve the model" |
| Experiment Setup | Yes | For these experiments, we set the model's temperature to 0.2 (controls randomness of the model's output) and the maximum number of failed iterations to 5. |