reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Optimal Decision Sets and Lists with SAT

Authors: Jinqiang Yu, Alexey Ignatiev, Peter J. Stuckey, Pierre Le Bodic

JAIR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiments in this paper demonstrate that the optimal decision sets computed by the SAT-based approach are comparable with the best heuristic methods, but much more succinct, and thus, more explainable. We contrast the size and test accuracy of optimal decisions lists versus optimal decision sets, as well as other state-of-the-art methods for determining optimal decision lists. Finally, we examine the size of average explanations generated by decision sets and decision lists. This is further detailed in Section 7 'Experimental Results'.
Researcher Affiliation	Academia	Jinqiang Yu EMAIL Alexey Ignatiev EMAIL Peter J. Stuckey EMAIL Pierre Le Bodic EMAIL Department of Data Science and Artiﬁcial Intelligence Monash University, Australia
Pseudocode	No	No explicit pseudocode or algorithm blocks are present in the main text of the paper. The methods are described in narrative text with mathematical formulations.
Open Source Code	Yes	All the proposed models are implemented as collections of Python scripts10 and solving is done by instrumenting incremental calls to SAT solver Glucose 3 (Audemard, Lagniez, & Simon, 2013) and Max SAT solver RC2-B (Ignatiev, Morgado, & Marques-Silva, 2018, 2019a). 10. https://github.com/alexeyignatiev/minds/ https://github.com/jinqiang-yu/dlsat/
Open Datasets	Yes	For the evaluation, we use the 71 datasets from the UCI Machine Learning Repository (Dua & Graﬀ, 2017) and Penn Machine learning Benchmarks (Olson, La Cava, Orzechowski, Urbanowicz, & Moore, 2017).
Dataset Splits	Yes	We also use 5-fold cross validation, which results in 355 pairs of training and test data split with the ratio 80% and 20%, respectively.
Hardware Specification	Yes	Experimental results are obtained on the Star Exec cluster9 (Stump, Sutcliﬀe, & Tinelli, 2014), each computing node of which uses an Intel Xeon E5-2609 2.40GHz CPU with 128GByte of RAM, running Cent OS 7.7.
Software Dependencies	Yes	All the proposed models are implemented as collections of Python scripts10 and solving is done by instrumenting incremental calls to SAT solver Glucose 3 (Audemard, Lagniez, & Simon, 2013) and Max SAT solver RC2-B (Ignatiev, Morgado, & Marques-Silva, 2018, 2019a).
Experiment Setup	Yes	The time limit and memory limit used per process are 1800 seconds and 16 GB. For the evaluation, we use the 71 datasets from the UCI Machine Learning Repository (Dua & Graﬀ, 2017) and Penn Machine learning Benchmarks (Olson, La Cava, Orzechowski, Urbanowicz, & Moore, 2017). We also use 5-fold cross validation, which results in 355 pairs of training and test data split with the ratio 80% and 20%, respectively. Finally, feature domains are quantized into 2, 3, and 4 intervals and then one-hot encoded (Pedregosa et al., 2011). Some variants of the Max SAT model for sparse decision sets named sp[λi] are assessed with three diﬀerent values of regularized penalty: λ1 = 0.005, λ2 = 0.05, and λ3 = 0.5.