reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

GeoILP: A Synthetic Dataset to Guide Large-Scale Rule Induction

Authors: Si Chen, Richong Zhang, Xu Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that no existing method can solve Geo ILP tasks. In addition, along with classic symbolic-form data, we provide image-form data to boost the development of the joint learning of neural perception and symbolic rule induction.
Researcher Affiliation	Academia	Si Chen 1, Richong Zhang 1, 2, Xu Zhang 3 1 SKLCCSE, Beihang University, Beijing, China 2 Zhongguancun Laboratory, Beijing, China 3 The National Computer Network Emergency Response Technical Team / Coordination Center of China (CNCERT/CC) EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the steps for ILP task synthesis and the logic preliminaries in narrative text without presenting them as structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states that the Geo ILP dataset is available at a GitHub link, but it does not provide an explicit statement or link for the open-source code of the methodology used to synthesize the dataset or for any other original code developed for this paper.
Open Datasets	Yes	Therefore, we construct Geo ILP 1, a large-scale dataset synthesized from plane geometry rules that help generate reference hypotheses involving various language biases. We first adopt a symbolic deduction engine to obtain target examples from the rules and determine the background knowledge and hypotheses by tracing back from the examples. [...] 1Data is available at https://github.com/chensi99/Geo ILP.
Dataset Splits	Yes	After deduction and traceback, we repeat the BK and target examples ten times, retaining predicates unchanged but mapping every point to new, unique points. In other words, the initial group of points is duplicated into ten groups. Then, the data are divided into training set and evaluation set according to 8:2 point groups.
Hardware Specification	Yes	Difflog 10 throws an out-of-memory error on a server with 500GB of memory, an order of magnitude larger than in the original paper (64GB).
Software Dependencies	Yes	We conduct experiments using Popper 9, enabling predicate invention, recursion and noise handling. [...] 9Version 4.3.0: https://github.com/logic-and-learning-lab/Popper/tree/v4.3.0. [...] We leverage the implementation and recommend parameter setting in https://github.com/petablox/difflog/tree/3c2d5218d9a0a1e200ebbf2d6a1e5d077fb18826.
Experiment Setup	Yes	Noise handling is turned on because Geo ILP follows OWA, while Popper follows CWA. The maximum number of variables in a rule is set to 12, which is the maximum value in every four levels. When conducting experiments on different levels, we set the maximum number of body atoms and the maximum number of rules to the maximum values of the learning level.