reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Solving Higher-Order Quantified Boolean Satisfiability via Higher-Order Model Checking

Authors: Hiroshi Unno, Takeshi Tsukada, Jie-Hong Roland Jiang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implemented the proposed reduction and combined it with the state-of-the-art higher-order model checker HORSAT2 (Broadbent and Kobayashi 2013; Kobayashi 2016), realizing the first HOQBF solver, which we call HOMCSAT. We evaluated the solver using a benchmark set consisting of 21 problem instances and obtained promising results while also identifying the limitations of the backend solver and suggesting directions for improvement. The experimental results are summarized in Table 1.
Researcher Affiliation	Academia	1Tohoku University 2Chiba University 3National Taiwan University EMAIL, EMAIL, EMAIL
Pseudocode	Yes	P x A.φ := let rec f = t in f(zero) λx A.Pφ(x) and if ismax(x) then true else f (x + 1). ... P x A.φ := let rec f = t in f(zero) λx A.Pφ(x) and if ismax A(x) then true else f (succ A(x))
Open Source Code	Yes	Available from https://github.com/hiroshi-unno/coar.
Open Datasets	No	We prepared a HOSAT benchmark set consisting of 21 problems to conduct preliminary experiments for evaluating HOMCSAT. The paper describes the problems used in the benchmark set, but does not provide a direct link, DOI, or specific repository name for the benchmark set itself.
Dataset Splits	No	The paper mentions using a 'HOSAT benchmark set consisting of 21 problem instances' but does not specify any training/test/validation dataset splits. The problems are treated as individual instances to be solved.
Hardware Specification	Yes	The experiments were conducted on a machine with a 12th Gen Intel(R) Core(TM) i7-1270P 2.20 GHz processor and 32 GB of memory.
Software Dependencies	No	We implemented the proposed reduction and combined it with the state-of-the-art higher-order model checker HORSAT2 (Broadbent and Kobayashi 2013; Kobayashi 2016) as the backend higher-order model checker. The paper mentions the software HORSAT2 but does not provide a specific version number for it.
Experiment Setup	Yes	T/O indicates a timeout (300 seconds), and M/O represents out of memory.