GeoILP: A Synthetic Dataset to Guide Large-Scale Rule Induction
Authors: Si Chen, Richong Zhang, Xu Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that no existing method can solve Geo ILP tasks. In addition, along with classic symbolic-form data, we provide image-form data to boost the development of the joint learning of neural perception and symbolic rule induction. |
| Researcher Affiliation | Academia | Si Chen 1, Richong Zhang 1, 2, Xu Zhang 3 1 SKLCCSE, Beihang University, Beijing, China 2 Zhongguancun Laboratory, Beijing, China 3 The National Computer Network Emergency Response Technical Team / Coordination Center of China (CNCERT/CC) EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the steps for ILP task synthesis and the logic preliminaries in narrative text without presenting them as structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states that the Geo ILP dataset is available at a GitHub link, but it does not provide an explicit statement or link for the open-source code of the methodology used to synthesize the dataset or for any other original code developed for this paper. |
| Open Datasets | Yes | Therefore, we construct Geo ILP 1, a large-scale dataset synthesized from plane geometry rules that help generate reference hypotheses involving various language biases. We first adopt a symbolic deduction engine to obtain target examples from the rules and determine the background knowledge and hypotheses by tracing back from the examples. [...] 1Data is available at https://github.com/chensi99/Geo ILP. |
| Dataset Splits | Yes | After deduction and traceback, we repeat the BK and target examples ten times, retaining predicates unchanged but mapping every point to new, unique points. In other words, the initial group of points is duplicated into ten groups. Then, the data are divided into training set and evaluation set according to 8:2 point groups. |
| Hardware Specification | Yes | Difflog 10 throws an out-of-memory error on a server with 500GB of memory, an order of magnitude larger than in the original paper (64GB). |
| Software Dependencies | Yes | We conduct experiments using Popper 9, enabling predicate invention, recursion and noise handling. [...] 9Version 4.3.0: https://github.com/logic-and-learning-lab/Popper/tree/v4.3.0. [...] We leverage the implementation and recommend parameter setting in https://github.com/petablox/difflog/tree/3c2d5218d9a0a1e200ebbf2d6a1e5d077fb18826. |
| Experiment Setup | Yes | Noise handling is turned on because Geo ILP follows OWA, while Popper follows CWA. The maximum number of variables in a rule is set to 12, which is the maximum value in every four levels. When conducting experiments on different levels, we set the maximum number of body atoms and the maximum number of rules to the maximum values of the learning level. |