Learning Explanatory Rules from Noisy Data

Authors: Richard Evans, Edward Grefenstette

JAIR 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implemented our model in Tensor Flow (Abadi, Agarwal, Barham, Brevdo, Chen, Citro, Corrado, Davis, Dean, J., Devin, M., et al., 2016) and tested it with three types of experiment. First, we used standard symbolic ILP tasks, where ilp is given discrete error-free input. Second, we modified the standard symbolic ILP tasks so that a certain proportion of the positive and negative examples are wilfully mis-labelled. Third, we tested it with fuzzy, ambiguous data, connecting ilp to the output of a pretrained convolution neural network that classifies MNIST digits.
Researcher Affiliation Industry Richard Evans EMAIL Edward Grefenstette EMAIL Deep Mind, London, UK
Pseudocode No The paper describes the methodology using narrative text, mathematical formulations, and architecture diagrams (e.g., Figure 1). There are no explicitly labeled pseudocode or algorithm blocks with structured, code-like steps.
Open Source Code No The paper states: "We implemented our model in Tensor Flow (Abadi, Agarwal, Barham, Brevdo, Chen, Citro, Corrado, Davis, Dean, J., Devin, M., et al., 2016)". This only indicates that TensorFlow was used for implementation, not that the authors' specific code for this work is openly available. There is no explicit statement about releasing the code or a link to a repository.
Open Datasets Yes We tested ilp on 20 ILP tasks, taken from four domains: arithmetic, lists, group-theory, and family tree relations. Some of the arithmetic examples appeared in the work of Cropper and Muggleton (2016). The list examples are used by Feser, Chaudhuri, and Dillig (2015). The family tree dataset comes from Wang, Mazaitis, and Cohen (2015) and is also used by Yang, Yang, and Cohen (2016). [...] Unlike symbolic ILP systems, ilp is also able to handle ambiguous or fuzzy data. We tested ilp by connecting it to a convolutional net trained on MNIST digits, and it was still able to learn effectively (see Section 5.5).
Dataset Splits Yes For validation and test, we use positive and negative examples of the even predicate on numbers greater than 10. [...] The training data was integers from 100 to 1024. The integers below 100 were held out as test data. [...] We ran the less-than experiment while holding out certain pairs of integers. Please note that we are not just holding out pairs of images. Rather, we are holding out pairs of integers, and removing from training every pair of images whose labels match that pair. [...] its performance is robust when holding out 70% of the data.
Hardware Specification No We gave Metagol and ilp the same fixed time limit (24 hours running on a standard workstation). This statement mentions a "standard workstation" but lacks specific details regarding CPU models, GPU types, or memory, which are necessary for hardware reproducibility.
Software Dependencies No We implemented our model in Tensor Flow (Abadi, Agarwal, Barham, Brevdo, Chen, Citro, Corrado, Davis, Dean, J., Devin, M., et al., 2016). The paper mentions the use of TensorFlow but does not specify a version number for this or any other software component, which is necessary for reproducible dependency information.
Experiment Setup Yes We tried a range of optimisation algorithms: Stochastic Gradient Descent, Adam, Ada Delta, and RMSProp. We searched across a range of learning rates in {0.5, 0.2, 0.1, 0.05, 0.01, 0.001}. Weights were initialised randomly from a normal distribution with mean 0 and a standard deviation that ranged between 0 and 2 (the standard deviation was a hyperparameter but the mean was fixed). [...] we used RMSProp with a learning rate of 0.5, and initialised clause weights by sampling from a N(0, 1) distribution. [...] We train for 6000 steps, adjusting rule weights to minimise cross entropy loss as described above. [...] Each step we sample a mini-batch from the positive and negative examples.