reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Eco Search: A No-delay Best-First Search Algorithm for Program Synthesis

Authors: Théo Matricon, Nathanaël Fijalkow, Guillaume Lagarde

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of ECO SEARCH in two classic domains: the Deep Coder (Balog et al. 2017) domain of integer list manipulations and in the Flash Fill (Gulwani 2011) domain of string manipulations. In our experiments, ECO SEARCH solves twice as many tasks in the same amount of time than previous methods. [...] Q1: Does ECO SEARCH improve the performance of enumerative approaches on program synthesis tasks? Q2: How does the performance of these algorithms scale with the complexity of the grammar? Datasets. We consider two classic domains: string manipulations and integer list manipulations.
Researcher Affiliation	Academia	1 La BRI, CNRS, France, 2 La BRI, Universit e de Bordeaux, France,
Pseudocode	Yes	The full description and pseudocode is given in the extended version which also includes a complete description and pseudocode with worked out examples of the two predecessors HEAP SEARCH and BEE SEARCH. [...] Algorithm 1 The generation part of BEE SEARCH [...] Algorithm 2 The generation part in ECO SEARCH [...] Algorithm 3 Update part in ECO SEARCH
Open Source Code	Yes	Code https://github.com/Synthesis Lab/Deep Synth2/tree/ eco search aaai [...] The code is made available as supplementary material, it contains the seeds used, the cost functions and all other additional minor experimental details.
Open Datasets	Yes	For string manipulations we use the same setting as in BEE SEARCH (Ameen and Lelis 2023): Flash Fill s 205 tasks from Sy Gu S. [...] For integer list manipulation we use the Deep Coder (Balog et al. 2017) dataset comprised of 366 tasks.
Dataset Splits	No	The paper mentions using specific datasets (Flash Fill, Deep Coder) and evaluating performance on a set of 'tasks'. However, it does not explicitly provide training/test/validation splits (percentages, counts, or specific methodology) for these datasets or for the 'synthetic dataset' mentioned for training neural networks.
Hardware Specification	Yes	All experiments were run on a 16 GB RAM machine with an Intel Xeon(R) W-1270 CPU running at up to 3.40GHz, running Ubuntu Jellyfish (no GPUs were used).
Software Dependencies	No	The paper states, 'All algorithms are re-implemented in Python.' However, it does not specify the version of Python or any other key software libraries or frameworks with their specific version numbers.
Experiment Setup	Yes	We run all best-ﬁrst search algorithms on our benchmarks. with a timeout of ﬁve minutes (300s) per task. [...] For BEE SEARCH we follow the original implementation and round off cost values to 10 2 in log space (since our cost function are probabilities). For ECO SEARCH we need to discretize costs, as follows. We discretize probabilities in log space up to 10 5 By default we use a bucket size of 20. We also experiment with other values, and for comparison, we also consider ECO SEARCH without buckets.