Generating Streamlining Constraints with Large Language Models
Authors: Florentina Voboril, Vaidyanathan Peruvemba Ramaswamy, Stefan Szeider
JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluated across seven diverse constraint satisfaction problems, our method achieves substantial runtime reductions. ... We evaluate Stream LLM extensively across seven diverse constraint problems, including standard benchmarks and a novel hypergraph coloring problem, using hundreds of instances specified in Mini Zinc. The problems are formulated in the Mini Zinc constraint programming language (Nethercote et al. 2007). We test Stream LLM with two state-of-the-art LLMs (GPT-4o and Claude 3.5 Sonnet), different prompting variants, and different strategies for combining adaptive prompting with test runs on easy instances. Our results demonstrate remarkable effectiveness, with some streamliners reducing solving time by up to 99%. |
| Researcher Affiliation | Academia | FLORENTINA VOBORIL, VAIDYANATHAN PERUVEMBA RAMASWAMY, and STEFAN SZEIDER, Technische Universität Wien, Austria |
| Pseudocode | Yes | Algorithm 1 outlines the general strategy, which can be summarized as follows. |
| Open Source Code | Yes | The source code and benchmark instances can be found on Zenodo (Voboril, P R, et al. 2025). |
| Open Datasets | Yes | All instances, Mini Zinc models, and the Python implementation of Stream LLM (which includes the prompts) are available on Zenodo2. https://zenodo.org/doi/10.5281/zenodo.13331048 |
| Dataset Splits | Yes | For each problem, we create 15 training instances that take less than 10 seconds. Further, we create at least 50 test instances that take between 10 minutes and 2 hours to solve. ... The offline approach investigates whether extended training time improves streamliner quality. We increase the timeout from 10 minutes to 4 hours and expand the training set from 15 instances solvable within 10 seconds to 35 instances with solving times between 1 and 30 seconds. |
| Hardware Specification | Yes | We evaluate the running times of the test instances on compute nodes with two 2.4 GHz 10-core Intel Xeon E5-2640 v4 processors with 80 GB of RAM each. |
| Software Dependencies | Yes | We use Mini Zinc 2.8.3 and the Chuffed 0.13.1 solver. As LLMs we use GPT-4o (gpt-4o-2024-11-20) and Claude 3.5 Sonnet (claude-3-5-sonnet-20241022) and access them via the openai 1.29.0 and anthropic 0.25.8 packages in Python 3.11.5. |
| Experiment Setup | Yes | Our system submits several queries (prompts) to the LLM that result in candidate streamliners and tests the candidate streamliners on a small set of ntrain easy training instances that can be solved in less than ttrain seconds. Rigorous testing on several problems indicates that a relatively small number of test instances suffices (about 15) and even small and easy test instances (solvable in under 10 seconds with the unstreamlined model) provide a good indication of how well streamliners will work on large and hard instances; we lay out these experiments in more detail in Sections 4.5 and 4.6. ... Algorithm 1 outlines the general strategy... Then, for about ten minutes, the algorithm repeatedly prompts the LLM to generate candidate streamliners and evaluates their performance on the training instances. ... We use five sets of training instances with an upper bound of ttrain {10, 20, 30, 60} seconds, respectively. The experiments show that ttrain has no influence on the one streamliner that was picked in the end, hence we settled on the upper bound ttrain = 10 for the following experiments. ... setting ntrain = 15 seems a fair compromise for the forthcoming experiments. |