GeoILP: A Synthetic Dataset to Guide Large-Scale Rule Induction

Authors: Si Chen, Richong Zhang, Xu Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that no existing method can solve Geo ILP tasks. In addition, along with classic symbolic-form data, we provide image-form data to boost the development of the joint learning of neural perception and symbolic rule induction.
Researcher Affiliation Academia Si Chen 1, Richong Zhang 1, 2, Xu Zhang 3 1 SKLCCSE, Beihang University, Beijing, China 2 Zhongguancun Laboratory, Beijing, China 3 The National Computer Network Emergency Response Technical Team / Coordination Center of China (CNCERT/CC) EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the steps for ILP task synthesis and the logic preliminaries in narrative text without presenting them as structured pseudocode or algorithm blocks.
Open Source Code No The paper states that the Geo ILP dataset is available at a GitHub link, but it does not provide an explicit statement or link for the open-source code of the methodology used to synthesize the dataset or for any other original code developed for this paper.
Open Datasets Yes Therefore, we construct Geo ILP 1, a large-scale dataset synthesized from plane geometry rules that help generate reference hypotheses involving various language biases. We first adopt a symbolic deduction engine to obtain target examples from the rules and determine the background knowledge and hypotheses by tracing back from the examples. [...] 1Data is available at https://github.com/chensi99/Geo ILP.
Dataset Splits Yes After deduction and traceback, we repeat the BK and target examples ten times, retaining predicates unchanged but mapping every point to new, unique points. In other words, the initial group of points is duplicated into ten groups. Then, the data are divided into training set and evaluation set according to 8:2 point groups.
Hardware Specification Yes Difflog 10 throws an out-of-memory error on a server with 500GB of memory, an order of magnitude larger than in the original paper (64GB).
Software Dependencies Yes We conduct experiments using Popper 9, enabling predicate invention, recursion and noise handling. [...] 9Version 4.3.0: https://github.com/logic-and-learning-lab/Popper/tree/v4.3.0. [...] We leverage the implementation and recommend parameter setting in https://github.com/petablox/difflog/tree/3c2d5218d9a0a1e200ebbf2d6a1e5d077fb18826.
Experiment Setup Yes Noise handling is turned on because Geo ILP follows OWA, while Popper follows CWA. The maximum number of variables in a rule is set to 12, which is the maximum value in every four levels. When conducting experiments on different levels, we set the maximum number of body atoms and the maximum number of rules to the maximum values of the learning level.