Learning Optimal Decision Sets and Lists with SAT

Authors: Jinqiang Yu, Alexey Ignatiev, Peter J. Stuckey, Pierre Le Bodic

JAIR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments in this paper demonstrate that the optimal decision sets computed by the SAT-based approach are comparable with the best heuristic methods, but much more succinct, and thus, more explainable. We contrast the size and test accuracy of optimal decisions lists versus optimal decision sets, as well as other state-of-the-art methods for determining optimal decision lists. Finally, we examine the size of average explanations generated by decision sets and decision lists. This is further detailed in Section 7 'Experimental Results'.
Researcher Affiliation Academia Jinqiang Yu EMAIL Alexey Ignatiev EMAIL Peter J. Stuckey EMAIL Pierre Le Bodic EMAIL Department of Data Science and Artificial Intelligence Monash University, Australia
Pseudocode No No explicit pseudocode or algorithm blocks are present in the main text of the paper. The methods are described in narrative text with mathematical formulations.
Open Source Code Yes All the proposed models are implemented as collections of Python scripts10 and solving is done by instrumenting incremental calls to SAT solver Glucose 3 (Audemard, Lagniez, & Simon, 2013) and Max SAT solver RC2-B (Ignatiev, Morgado, & Marques-Silva, 2018, 2019a). 10. https://github.com/alexeyignatiev/minds/ https://github.com/jinqiang-yu/dlsat/
Open Datasets Yes For the evaluation, we use the 71 datasets from the UCI Machine Learning Repository (Dua & Graff, 2017) and Penn Machine learning Benchmarks (Olson, La Cava, Orzechowski, Urbanowicz, & Moore, 2017).
Dataset Splits Yes We also use 5-fold cross validation, which results in 355 pairs of training and test data split with the ratio 80% and 20%, respectively.
Hardware Specification Yes Experimental results are obtained on the Star Exec cluster9 (Stump, Sutcliffe, & Tinelli, 2014), each computing node of which uses an Intel Xeon E5-2609 2.40GHz CPU with 128GByte of RAM, running Cent OS 7.7.
Software Dependencies Yes All the proposed models are implemented as collections of Python scripts10 and solving is done by instrumenting incremental calls to SAT solver Glucose 3 (Audemard, Lagniez, & Simon, 2013) and Max SAT solver RC2-B (Ignatiev, Morgado, & Marques-Silva, 2018, 2019a).
Experiment Setup Yes The time limit and memory limit used per process are 1800 seconds and 16 GB. For the evaluation, we use the 71 datasets from the UCI Machine Learning Repository (Dua & Graff, 2017) and Penn Machine learning Benchmarks (Olson, La Cava, Orzechowski, Urbanowicz, & Moore, 2017). We also use 5-fold cross validation, which results in 355 pairs of training and test data split with the ratio 80% and 20%, respectively. Finally, feature domains are quantized into 2, 3, and 4 intervals and then one-hot encoded (Pedregosa et al., 2011). Some variants of the Max SAT model for sparse decision sets named sp[λi] are assessed with three different values of regularized penalty: λ1 = 0.005, λ2 = 0.05, and λ3 = 0.5.