SAT-based Decision Tree Learning for Large Data Sets
Authors: Andre Schidler, Stefan Szeider
JAIR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our new approach experimentally on a range of real-world instances that contain up to several thousand samples. In almost all cases, our method successfully decreases the complexity of the initial decision tree; often, the decrease is significant. |
| Researcher Affiliation | Academia | André Schidler EMAIL Stefan Szeider EMAIL Algorithms & Complexity, TU Wien, Favoritenstrasse 9-11, 1040 Vienna, Austria |
| Pseudocode | Yes | Algorithm 1: DT-SLIM(H) |
| Open Source Code | Yes | Results and source code are available at https://doi.org/10.5281/zenodo.11314227 and the current version of the source code is available at https://github.com/ASchidler/decision_tree. |
| Open Datasets | Yes | We evaluate our new approach experimentally on data sets from the UCI Machine Learning Repository and prior work. |
| Dataset Splits | Yes | We split the other 61 instances into five folds for cross validation, resulting in 313 instances in total. ... For the eight instances that provided a test set, the validation set is created by splitting off 25% of the samples. |
| Hardware Specification | Yes | We ran the experiments with a memory limit of 12 GB on servers with two Intel Xeon E5-2640 v4 CPUs running at 2.40 GHz and using Ubuntu 18.04. |
| Software Dependencies | Yes | Our implementation uses Python 3.9 and Py SAT 0.1.73 (Ignatiev, Morgado, & Marques-Silva, 2018). We use an adapted version of the Glucose 3.04 (Audemard & Simon, 2009) SAT solver... Weka 3.8.55 (Frank, Hall, Holmes, Kirkby, & Pfahringer, 2005) and CART implemented in scikit-learn 0.24.16 (Pedregosa et al., 2011). |
| Experiment Setup | Yes | We use a sample limit of 25000 samples for low depths and incrementally lower it to 250 until the maximal depth of 14. ... We picked our limits such that it is reasonable to expect that the solver will finish within a given timeout, in our case five minutes. ... DT-SLIM and STree D were limited to 12 hours per run. |