reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Certified Knowledge Compilation with Application to Formally Verified Model Counting

Authors: Randal E. Bryant, Wojciech Nawrocki, Jeremy Avigad, Marijn J. H. Heule

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results seek to answer the following questions: ... All experiments were run on a 2021 Apple Mac Book Pro, with a 3.2 Ghz Apple M1 processor and 64 GB of RAM. We used a Samsung T7 solid-state disk (SDD) for file storage. ... For benchmark problems, we used the public problems from the 2022 unweighted and weighted model counting competitions.
Researcher Affiliation	Academia	Randal E. Bryant EMAIL Wojciech Nawrocki EMAIL Jeremy Avigad EMAIL Marijn J. H. Heule EMAIL Carnegie Mellon University Pittsburgh, PA USA
Pseudocode	No	The paper describes methods and steps in prose, for example, the "structural approach" in Section 8.1 and the "CPOG Step Types" in Table 1. However, it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	Yes	The source code for all tools, as well as the Lean 4 derivation and checker, is available at https://zenodo.org/records/14933727.
Open Datasets	Yes	For benchmark problems, we used the public problems from the 2022 unweighted and weighted model counting competitions.5 ... 5. Downloaded from https://mccompetition.org/2022/mc_description.html
Dataset Splits	No	The paper evaluates its toolchain using 'public problems from the 2022 unweighted and weighted model counting competitions'. These are benchmark CNF formulas used as inputs for model counting, not datasets that are typically split into training, validation, and test sets for model training. Therefore, the concept of dataset splits as defined in the question is not directly applicable, and no such splits are provided.
Hardware Specification	Yes	All experiments were run on a 2021 Apple Mac Book Pro, with a 3.2 Ghz Apple M1 processor and 64 GB of RAM. We used a Samsung T7 solid-state disk (SDD) for file storage.
Software Dependencies	No	The paper mentions several software tools such as 'Ca Di Cal SAT solver', 'lrat-trim', 'D4 knowledge compiler', and 'Lean 4 theorem prover and programming language'. However, it does not provide specific version numbers for these software components, only referring to 'a recent version' of Ca Di Cal or citing a paper for Lean 4, which is not a software version.
Experiment Setup	Yes	The experimental setup details for the hybrid approach to proof generation are explicitly given in Section 12.2: 'If the tree ratio is at most 5.0, and the tree size of the root is below 10^6, do the entire proof generation with a monolithic approach... If the tree ratio is at most 5.0, and the tree size of the root is above 10^6, start with a structural approach and shift to a monolithic approach once the tree size at a node is below 10^6. If the tree ratio is above 5.0, do the entire proof generation with a structural approach.' The paper also states a 'runtime limit of 4,000 seconds' for D4 and '10,000 seconds for the toolchain'.