DeepMath - Deep Sequence Models for Premise Selection
Authors: Geoffrey Irving, Christian Szegedy, Alexander A. Alemi, Niklas Een, Francois Chollet, Josef Urban
NeurIPS 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study the effectiveness of neural sequence models for premise selection in automated theorem proving, one of the main bottlenecks in the formalization of mathematics. We propose a two stage approach for this task that yields good results for the premise selection task on the Mizar corpus while avoiding the handengineered features of existing state-of-the-art models. To our knowledge, this is the first time deep learning has been applied to theorem proving on a large scale. (Abstract) ... 6 Experiments (Section Title) |
| Researcher Affiliation | Collaboration | Alexander A. Alemi Google Inc. EMAIL François Chollet Google Inc. EMAIL Niklas Een Google Inc. EMAIL Geoffrey Irving Google Inc. EMAIL Christian Szegedy Google Inc. EMAIL Josef Urban Czech Technical University in Prague EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. It provides architectural diagrams but not algorithmic steps in pseudocode format. |
| Open Source Code | No | The paper mentions using 'Tensor Flow framework [1] and the Keras library [5]' (Section 6.4), but it does not state that the authors are releasing their own source code for the methodology described in the paper. |
| Open Datasets | Yes | We use the Mizar Mathematical Library (MML) version 4.181.11473 as the formal corpus and E prover [32] version 1.9 as the underlying ATP system. (Section 3) ... For training and evaluation we use a subset of 32,524 out of 57,917 theorems that are known to be provable by an ATP given the right set of premises. (Section 6.1) |
| Dataset Splits | Yes | We split off a random 10% of these (3,124 statements) for testing and validation. Also, we held out 400 statements from the 3,124 for monitoring training progress, as well as for model and checkpoint selection. (Section 6.1) |
| Hardware Specification | Yes | The neural networks were trained using asynchronous distributed stochastic gradient descent using the Adam optimizer [24] with up to 20 parallel NVIDIA K-80 GPU workers per model. (Section 6.4) |
| Software Dependencies | No | The paper mentions using 'Tensor Flow framework [1] and the Keras library [5]' (Section 6.4), but it does not specify version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | All our neural network models use the general architecture from Fig 3: a classifier on top of the concatenated embeddings of an axiom and a conjecture. The same classifier architecture was used for all models: a fully-connected neural network with one hidden layer of size 1024. (Section 6.3) ... The character level models were trained with maximum sequence length 2048 characters, where the word-level (and definition embedding) based models had a maximum sequence length of 500 words. (Section 6.4) ... For E prover, we used auto strategy with a soft time limit of 90 seconds, a hard time limit of 120 seconds, a memory limit of 4 GB, and a processed clauses limit of 500,000. (Section 6.5) |