reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Quantified Linear and Polynomial Arithmetic Satisfiability via Template-based Skolemization

Authors: Krishnendu Chatterjee, Ehsan Kafshdar Goharshady, Mehrdad Karrabi, Harshit J Motwani, Maximilian Seeliger, Đorđe Žikelić

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method offers a range of appealing theoretical properties combined with a strong practical performance. On the theory side, our method is sound, semi-complete, and runs in subexponential time and polynomial space... On the practical side, our experiments show superior performance compared to stateof-the-art SMT solvers in terms of the number of solved instances and runtime, both on LRA and on NRA benchmarks. We implemented a prototype of our method for satisfiability checking in a tool called Quanti SAT1, and compared it against two state-of-the-art SMT solvers that support satisfiability checking for quantified LRA and NRA formulas.
Researcher Affiliation	Academia	1Institute of Science and Technology Austria, Klosterneuburg, Austria 2Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 3Vienna University of Technology, Vienna, Austria 4Singapore Management University, Singapore, Singapore EMAIL
Pseudocode	Yes	Algorithm 1 shows the pseudocode of our satisfiability checking prpcedure. Algorithm 1: Satisfiability checking for quantified LRA and NRA formulas Input: Quantified formula ϕ = Q1x1.Q2x2. . . . Qnxn.p where p is in CNF. Parameter: D, max polynomial degree of templates Output: (SAT,existential model), or UNKNOWN 1: For every i where Qi , let fi be a polynomial over x1, . . . xi 1 of degree D with unknown coefficients. 2: Replace each occurrence of xi with fi, and obtain ϕUNIV := x1 . . . xm.p UNIV x1 . . . xm.ψ1 ψr. 3: Let ϕFREE := True. 4: for all ψi do 5: Convert ψi into a polynomial entailment Φ ψ. 6: Apply Farkas, Handelman or Positivstellensätze Theorem to Φ ψ in order to obtain ψi. 7: ϕFREE ϕFREE ψi. 8: end for 9: if ϕFREE is satisfiable then 10: model := get Model(ϕFREE) 11: Return (SAT,model) 12: else 13: Return UNKNOWN 14: end if
Open Source Code	Yes	We implemented a prototype of our method for satisfiability checking in a tool called Quanti SAT1... 1https://doi.org/10.5281/zenodo.13341655
Open Datasets	Yes	Benchmarks. We consider three benchmark suites of quantified formulas in LRA and NRA: 1. The Keymaera benchmark suite, taken from SMT-COMP (Bobot et al. 2023), contains 222 LRA formulas and 3813 non-linear formulas. ... 2. The Mjollnir (Monniaux 2010) benchmark suite consists of 3600 LRA formulas. ... 3. The Poly Synth (Goharshady et al. 2023) benchmark suite consists of 32 NRA formulas.
Dataset Splits	No	Benchmarks. We consider three benchmark suites of quantified formulas in LRA and NRA: 1. The Keymaera benchmark suite, taken from SMT-COMP (Bobot et al. 2023), contains 222 LRA formulas and 3813 non-linear formulas. ... 2. The Mjollnir (Monniaux 2010) benchmark suite consists of 3600 LRA formulas. ... 3. The Poly Synth (Goharshady et al. 2023) benchmark suite consists of 32 NRA formulas. (Explanation: The paper uses pre-existing benchmark suites as full sets of instances and does not describe any further splitting into training/test/validation sets by the authors for their experiments.)
Hardware Specification	Yes	For the experiments, we used a Debian 12 machine with a 2.45GHz AMD EPYC 7763 CPU and 16 GB of RAM.
Software Dependencies	No	Our tool Quanti SAT is written in Python and it uses Poly Horn (Chatterjee et al. 2024) as a back-end tool for applying the Positivstellensätze theorems in Step 2 of the algorithm. It uses Z3 (de Moura and Bjørner 2008) and Math SAT5 (Cimatti et al. 2013) for solving the quantifier-free formulas derived in Step 3 of the algorithm. (Explanation: The paper mentions software tools like Python, Poly Horn, Z3, and Math SAT5, but does not provide specific version numbers for any of them, which is required for reproducibility.)
Experiment Setup	Yes	In our experiments, we ran Quanti SAT with polynomial degrees for Skolem function templates equal to D {0, 1, 2}. Benchmarks. We consider three benchmark suites of quantified formulas in LRA and NRA: 1. The Keymaera benchmark suite... Experimental Setup and Baselines. We compare Quanti SAT against two state-of-the-art SMT solvers Z3 (de Moura and Bjørner 2008) and CVC5 (Barbosa et al. 2022). The timeout for each tool on each benchmark is set to 10 minutes. Since we are not only interested in satisfiability checking but also in computing witnesses for existentially quantified variables, in order to instruct Z3 and CVC5 to compute these witnesses, we provide them with two different variants of our benchmarks: Uninterpreted Skolemization. In the first variant, we only require the baselines to compute some witnesses for existentially quantified variables, not necessarily being polynomial expressions. Hence, we replace each existentially quantified variable by an uninterpreted predicate over the preceding universally quantified variables. For Z3, we denote the resulting baseline by Z3-u Sk. Since CVC5 does not support uninterpreted predicates, we could not evaluate it on this variant. Our goal here is to evaluate the effectiveness of our template-based Skolemization as opposed to general Skolem functions. Template Skolemization. In the second variant, we ask our baselines to compute witnesses for existentially quantified variables in terms of polynomials over universally quantified variables. Hence, rather than only using uninterpreted predicates, in this variant we use the same polynomial Skolem function templates as in our Quanti SAT. We consider polynomial degrees D {0, 1, 2}, and count each instance as solved by the baseline if it can be solved for at least one of these three polynomial degrees. For Z3, we denote the resulting baseline by Z3-t Sk. For CVC5, we denote the resulting baseline by CVC5-t Sk. Our goal here is to evaluate the effectiveness of our quantifier elimination method based on Positivstellensätze theorems, as opposed to other quantifier elimination procedures implemented in these SMT solvers.