Position: Formal Mathematical Reasoning—A New Frontier in AI
Authors: Kaiyu Yang, Gabriel Poesia, Jingxuan He, Wenda Li, Kristin E. Lauter, Swarat Chaudhuri, Dawn Song
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This position paper advocates formal mathematical reasoning as an indispensable component in future AI for math, formal verification, and verifiable generation. We summarize existing progress, discuss open challenges, and envision critical milestones to measure future success. |
| Researcher Affiliation | Collaboration | 1Meta FAIR 2Stanford University 3UC Berkeley 4University of Edinburgh 5UT Austin. Correspondence to: Kaiyu Yang <EMAIL>, Dawn Song <EMAIL>. |
| Pseudocode | No | The paper describes conceptual architectures and workflows (e.g., Figure 5 "A common architecture for neural theorem proving") but does not present structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the source code related to its own methodology. It mentions various external tools and systems used by other research (e.g., Lean, SymPy, Alpha Proof) but does not release its own code. |
| Open Datasets | Yes | benchmarks such as GSM8K (Cobbe et al., 2021) and MATH (Hendrycks et al., 2021). ... The Proof Pile dataset (Azerbayev et al., 2024)... Numina Math: The largest public dataset in AI4Maths with 860k pairs of competition math problems and solutions. https://github.com/projectnumina/aimo-progress-prize/blob/main/report/numina_dataset.pdf, 2024a. |
| Dataset Splits | No | The paper is a position paper and does not describe specific experiments or provide dataset split information for its own work. It refers to existing benchmarks and datasets (e.g., MATH, GSM8K) but does not detail how data would be split for a specific experimental setup within this paper. |
| Hardware Specification | No | The paper does not provide specific hardware details for any experiments conducted by the authors, as it is a position paper summarizing existing progress and future directions. |
| Software Dependencies | No | The paper discusses various formal systems and external tools (e.g., Lean, Sym Py) in a general context but does not specify particular software dependencies with version numbers required to reproduce any work presented within this position paper. |
| Experiment Setup | No | The paper does not provide specific experimental setup details, hyperparameters, or training configurations, as it is a position paper outlining perspectives and challenges, not presenting new experimental results. |