reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Comparison of SAT-Based and ASP-Based Algorithms for Inconsistency Measurement

Authors: Isabelle Kuhlmann, Anna Gessler, Vivien Laszlo, Matthias Thimm

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In an extensive experimental analysis, we compare the SAT-based and ASP-based approaches with each other, as well as with a set of naive baseline algorithms. Our results demonstrate that, overall, both the SAT-based and the ASP-based approaches clearly outperform the naive baseline methods in terms of runtime. The results further show that the proposed ASP-based approaches perform superior to the SAT-based ones with regard to all six inconsistency measures considered in this work. Moreover, we conduct additional experiments to explain the aforementioned results in greater detail.
Researcher Affiliation	Academia	Isabelle Kuhlmann EMAIL University of Hagen, Germany Anna Gessler EMAIL University of Koblenz, Germany Vivien Laszlo EMAIL Matthias Thimm EMAIL University of Hagen, Germany
Pseudocode	No	The paper describes algorithms in Sections 3 ("SAT-Based Algorithms for Selected Inconsistency Measures") and 4 ("ASP-Based Algorithms for Selected Inconsistency Measures"). However, it describes them in prose and presents "Encodings" (e.g., "Encoding 1: Overview of the SAT encoding Sc(K, u)" or "Encoding 7: Overview of the ASP encoding Pc(K)") which detail logical rules or constraints rather than structured, step-by-step pseudocode blocks.
Open Source Code	No	The paper provides a link for downloading the datasets used in the experiments: "Download (all data sets): https://fernuni-hagen.sciebo.de/s/sML2faFBiCib1nm". However, it does not explicitly state that the source code for the methodology described in the paper is available, nor does it provide a direct link to a code repository.
Open Datasets	Yes	To begin with, we consider the results wrt. the SRS data set, which has been used previously in (Kuhlmann & Thimm, 2021; Kuhlmann et al., 2022). Table 5 provides the number of solved instances as well as the cumulative runtime of each approach wrt. each measure and each data set. (For a more detailed overview, see also the plots in B.1 and B.3.) Looking at the SRS part of Table 5, a first noticeable observation is that, overall, the ASP-based approaches perform best, while the naive approaches perform, as expected, poorest. More precisely, the ASP-based approaches only time out in 175 out of 10,800 cases (1.62 %). To set this into perspective: the SAT approaches time out in 1552 cases (14.37 %), and the naive ones in Download (all data sets): https://fernuni-hagen.sciebo.de/s/sML2faFBiCib1nm The ML data set consists of knowledge bases containing formulas learnt from machine learning data (Thimm & Rienstra, 2020). More precisely, the data set Animals with attributes 8 describes 50 animals, e. g. ox, mouse, or dolphin, using 85 binary attributes such as swims , black , and arctic . We used the Apriori algorithm (Agrawal & Srikant, 1994) to mine association rules from this data set for a given minimal confidence value c and minimal support value s. All these rules were then interpreted as propositional logic implications. Finally, we selected one animal at random and added all its attributes as facts (thus making the knowledge base inconsistent as even rules with low confidence values were interpreted as strict implications). We set 8. http://attributes.kyb.tuebingen.mpg.de ARG Data set ARG consists of a total of 326 knowledge bases extracted from benchmark data of the International Competition on Computational Models of Argumentation 9. http://argumentationcompetition.org/2019/ SC Data set SC consists of the 100 smallest (in terms of file size) instances of the benchmark data set from the main track of the SAT competition 2020.10 10. https://satcompetition.github.io/2020/downloads.html LP Data set LP is generated from benchmark data for answer set programming. More specifically, we used the problems 15-Puzzle (11 instances), Hamiltonian Cycle (20 instances), and Labyrinth (14 instances). available from the Asparagus website11. 11. https://asparagus.cs.uni-potsdam.de
Dataset Splits	No	The paper describes the creation and composition of five different datasets (SRS, ML, ARG, SC, LP) used for evaluation. It details how knowledge bases were generated or extracted for each dataset. However, it does not provide specific information regarding dataset splits (e.g., percentages, sample counts, or methods for partitioning data into training, testing, or validation sets) that would be needed to reproduce the experiments. The evaluation focuses on running algorithms on these datasets as a whole, rather than on specific data splits.
Hardware Specification	Yes	Due to the significant amount of time required to complete the evaluation, two different servers were used to facilitate the evaluation of multiple data sets in parallel. The experiments regarding the SRS data set were run on a computer with 128 GB RAM and an Intel Xeon E5-2690 CPU which has a basic clock frequency of 2.90 GHz. The other data sets (ML, ARG, SC, and LP) were evaluated on server instances with 32 GB RAM and an Intel Xeon Platinum 8260M CPU with a basic clock frequency of 2.40 GHz.
Software Dependencies	Yes	Both the SAT-based and the ASP-based approaches are implemented in C++. The SAT solver we use is Ca Di Cal sc202113 (Biere, Fazekas, Fleury, & Heisinger, 2020), and the ASP solver we use is Clingo 5.5.114 (Gebser, Kaminski, Kaufmann, & Schaub, 2019).
Experiment Setup	Yes	In our experiment, we focus on the three previously mentioned inconsistency measures (Ic, Imax d , and IΣ d ) wrt. the SRS data set. Table 6 displays the results (see also Figures 8, 29, and 30 for further visualizations). We can see that with regard to Ic, the linear search variant of the SAT approach is on average a bit slower than the binary search variant. Nevertheless, the two methods perform quite similarly. A timeout was set to 600 seconds (i. e., 10 minutes) for instances from the SRS, ML, and ARG data sets. For instances from the SC and LP data sets, which are overall more challenging compared to those instances from the previously mentioned data sets, we increased the timeout to 5000 seconds (i. e., 83.3 minutes). This corresponds to the time limit used in the SAT competition 2020, from which the instances in the SC data set were selected. For the computation of cardinality constraints in SAT we use sequential counter encoding method (Sinz, 2005), and for transforming formulas to CNF we use Tseitin s method (Tseitin, 1968). Clingo s default option (which is used in all experiments described above) is a branch-and-bound approach (Gebser, Kaufmann, & Schaub, 2012). In this approach, we first search for an initial solution, i. e., an initial stable model, and (in case a solution exists) obtain its objective value. The next step consists of a loop in which we try to find a solution with a strictly better objective value i. e., in the case of minimization, a strictly lower value, and in the case of maximization, a strictly higher value. This is achieved by adding constraints which ensure that any new solution must have a lower (resp. higher) objective value than the solution derived before. If no better solution can be found, we know that the previous solution corresponds to an optimum (Kaminski, Schaub, & Wanko, 2017). As an example, we perform an experiment in which we set Clingo s optimization strategy to usc (Gebser, Kaminski, Kaufmann, Romero, & Schaub, 2015), employs a coreguided approach (Andres, Kaufmann, Matheis, & Schaub, 2012). The latter emerged in the area of Max SAT solving (Fu & Malik, 2006; Marques-Silva & Manquinho, 2008) and is based on the following procedure.