Online Prompt Selection for Program Synthesis

Authors: Yixuan Li, Lewis Frampton, Federico Mora, Elizabeth Polgreen

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implement an instance of our approach, called CYANEA, and evaluate it on synthesis queries from the literature in ranking function synthesis, from the syntax-guided synthesis competition, and fresh, unseen queries generated from SMT problems. CYANEA solves 37.2% more queries than the best single solver and achieves results within 4% of the virtual best solver.
Researcher Affiliation Academia Yixuan Li1, Lewis Frampton1, Federico Mora2, Elizabeth Polgreen1 1University of Edinburgh, UK 2University of California, Berkeley, USA EMAIL
Pseudocode No The paper uses flowcharts (Figure 2, Figure 3) to illustrate the system architecture and describes components in prose, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper states, "We implement an instance of our approach, called CYANEA..." but does not provide any explicit statement about releasing the source code or a link to a code repository.
Open Datasets Yes We evaluate our approach on synthesis queries from the Syntax-Guided Synthesis competition (Alur et al. 2024), ranking function synthesis (Giacobbe, Kroening, and Parsert 2022), and automatically generated from the SMT competition (Parsert and Polgreen 2024).
Dataset Splits No The paper mentions, "The total number of queries is 1269. We report the average scores of 20 runs, with the queries randomly shuffled in each run." This describes a re-sampling strategy for evaluation but does not specify explicit train/validation/test splits with percentages or counts.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies No The paper mentions "cvc5 (Barbosa et al. 2022) as the SMT solver" and uses "GPT(gpt-3.5-turbo-0125) and Llama(Meta-Llama-3-70B)". While the LLM models are specific, cvc5 is cited without an explicit version number in the text, and the LLMs are model names rather than traditional software library dependencies with version numbers.
Experiment Setup Yes We set a total timeout of T = 100 seconds and a total cost budget of C = 100,000. For all k-NN predictors, we set k = 15.