A Divide, Align and Conquer Strategy For Program Synthesis
Authors: Jonas Witt, Sebastijan Dumančić, Tias Guns, Claus-Christian Carbon
JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate BEN, our implementation of DA&C using standard top-down enumerative synthesis, across two domains: abstract visual reasoning used as a running example throughout the paper (Section 5.2) and string transformation tasks (Section 5.1). Table 4: Results on the training part of the Abstraction and Reasoning Corpus (ARC). Figure 8: Performance on the real-world string transformations data set. 5.2.2 Ablation Analysis |
| Researcher Affiliation | Academia | Jonas Witt EMAIL University of Bamberg, Markusplatz 3, 96047 Bamberg, Germany Sebastijan Dumanˇci c EMAIL TU Delft, Van Mourik Broekmanweg 6, 2628 XE Delft, The Netherlands Tias Guns EMAIL KU Leuven, Celestijnenlaan 200a, 3001 Leuven, Belgium Claus-Christian Carbon EMAIL University of Bamberg, Markusplatz 3, 96047 Bamberg, Germany |
| Pseudocode | Yes | Algorithm 1 Overview Divide-Align & Conquer Algorithm 2 Rank Correspondences Algorithm 3 Constrained DNF learner |
| Open Source Code | Yes | Our code and an overview of the tasks solved by BEN are available on Git Hub. |
| Open Datasets | Yes | We experiment with a publicly available data set of 130 real-world string transformation tasks from Cropper et al. (2020). Materials. We use the training part of the Abstraction and Reasoning Corpus (ARC) (Chollet, 2020) (Apache 2.0 license), which consists of 400 data sets that each contain 210 examples comprising an input and an output where the outputs are generated by an unspecified program that we wish to synthesize. |
| Dataset Splits | Yes | Its predictive accuracy on tasks with three training examples is not significantly different from its predictive accuracy on tasks with nine training examples Q1. The reason for this is that BEN only performs synthesis on component tuples and not sets of training examples where the complexity of a single synthesis loop is independent of the number of total training examples in a task. BEN solves over 80% of test examples given 2-3 training examples per task. |
| Hardware Specification | Yes | All experiments are run on a desktop with a single M1 Max CPU and 64GB of RAM. |
| Software Dependencies | No | The paper mentions "Python implementation" for BEN and "C++" for the Kaggle winner but does not specify any versions for these languages or any specific libraries or solvers with their version numbers. |
| Experiment Setup | Yes | The #1 Kaggle agent and BEN were both run at a search depth of four on all tasks. During the actual Kaggle competition, the #1 Kaggle agent ran as an ensemble at different search depths to optimize scheduling. We report more challenging results at a search depth of four. The ARGA agent does not expect a search depth as input; we use a time budget of 2 min. All agents were executed with a time budget of 60s per task. |