Online Prompt Selection for Program Synthesis
Authors: Yixuan Li, Lewis Frampton, Federico Mora, Elizabeth Polgreen
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement an instance of our approach, called CYANEA, and evaluate it on synthesis queries from the literature in ranking function synthesis, from the syntax-guided synthesis competition, and fresh, unseen queries generated from SMT problems. CYANEA solves 37.2% more queries than the best single solver and achieves results within 4% of the virtual best solver. |
| Researcher Affiliation | Academia | Yixuan Li1, Lewis Frampton1, Federico Mora2, Elizabeth Polgreen1 1University of Edinburgh, UK 2University of California, Berkeley, USA EMAIL |
| Pseudocode | No | The paper uses flowcharts (Figure 2, Figure 3) to illustrate the system architecture and describes components in prose, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states, "We implement an instance of our approach, called CYANEA..." but does not provide any explicit statement about releasing the source code or a link to a code repository. |
| Open Datasets | Yes | We evaluate our approach on synthesis queries from the Syntax-Guided Synthesis competition (Alur et al. 2024), ranking function synthesis (Giacobbe, Kroening, and Parsert 2022), and automatically generated from the SMT competition (Parsert and Polgreen 2024). |
| Dataset Splits | No | The paper mentions, "The total number of queries is 1269. We report the average scores of 20 runs, with the queries randomly shuffled in each run." This describes a re-sampling strategy for evaluation but does not specify explicit train/validation/test splits with percentages or counts. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions "cvc5 (Barbosa et al. 2022) as the SMT solver" and uses "GPT(gpt-3.5-turbo-0125) and Llama(Meta-Llama-3-70B)". While the LLM models are specific, cvc5 is cited without an explicit version number in the text, and the LLMs are model names rather than traditional software library dependencies with version numbers. |
| Experiment Setup | Yes | We set a total timeout of T = 100 seconds and a total cost budget of C = 100,000. For all k-NN predictors, we set k = 15. |