reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Relational Decomposition for Program Synthesis

Authors: Céline Hocquette, Andrew Cropper

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our approach using an off-the-shelf inductive logic programming (ILP) system on four challenging synthesis datasets. Our results show that (i) our representation can outperform a standard one, and (ii) an off-the-shelf ILP system with our representation can outperform domain-specific approaches.
Researcher Affiliation	Academia	C eline Hocquette1 and Andrew Cropper2 1University of Southampton 2University of Oxford EMAIL; EMAIL
Pseudocode	Yes	Algorithm 1 Example Decomposition 1 def decompose(E, D): 2 E+, E , B = {}, {}, {} 3 id = 0 4 for i 7 o in E: 5 id += 1 6 for x in i: 7 let (I1, . . . , In) be the position of x in i 8 let V be the value of x in i 9 B += in(id,I1,...,In,V ) 10 for y in o: 11 let (I1, . . . , Im) be the position of y in o 12 let V be the value of y in o 13 E+ += out(id,I1,...,Im,V ) 14 for W in D: 15 if W = V : 16 E += out(id,I1,...,Im,W) 17 return E+, E , B
Open Source Code	Yes	The experimental code and data are available at https://github.com/celinehocquette/ijcai25-relational-decomposition.
Open Datasets	Yes	1D-ARC. The 1D-ARC dataset [Xu et al., 2024] is a onedimensional adaptation of ARC. ARC. The ARC dataset [Chollet, 2019] evaluates to perform abstract reasoning and problem-solving from a small number of examples. Strings. This real-world dataset gathers userprovided examples from online forums and is inspired by a dataset of user-provided examples in Microsoft Excel [Gulwani, 2011]. List functions. This dataset [Rule, 2020; Rule et al., 2024] evaluates human and machine learning ability.
Dataset Splits	Yes	For the strings and list functions datasets, we perform leave-one-out cross-validation. For tasks 81 to 250 in the list functions dataset, due to the large number of constant values, we sample 10,000 negative examples per task.
Hardware Specification	Yes	We use an Intel compute node with dual 2.0 GHz Intel Xeon Gold 6138 processors, 40 CPU cores, and 192 GB of DDR4 memory. Each system uses a single CPU.
Software Dependencies	No	The paper mentions several systems like POPPER, ARGA, METABIAS, BEN, and HL but does not provide specific version numbers for these systems or any underlying software libraries (e.g., Python, specific ML frameworks).
Experiment Setup	No	The paper mentions experimental setup for each research question but does not specify concrete hyperparameter values or training configurations like learning rates, batch sizes, or optimizer settings. It describes general evaluation metrics and cross-validation strategies but lacks specific training parameters.