Witty: An Efficient Solver for Computing Minimum-Size Decision Trees
Authors: Luca Pascal Staus, Christian Komusiewicz, Frank Sommer, Manuel Sorge
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we test this paradigm empirically. We provide an implementation, augment it with extensive heuristic improvements, and scrutinize it on standard benchmark instances. The augmentations achieve a mean 324-fold (median 84-fold) speedup over the naive implementation. Compared to the state of the art they achieve a mean 32-fold (median 7-fold) speedup over the dynamic programming based Mur Tree solver and a mean 61-fold (median 25-fold) speedup over SAT-based implementations. |
| Researcher Affiliation | Academia | 1Institute of Computer Science, Friedrich Schiller University Jena, Germany 2Institute of Logic and Computation, TU Wien, Austria EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Base witness-tree algorithm. Input: A witness tree W, a data set (E, λ), and a maximum size s N. Output: A perfect witness tree of size at most s or if none could be found. 1 Function Refine (W, (E, λ), s) 2 if W classifies (E, λ) then return W 3 if W has size s then return 4 e some dirty example from Dirty(W) 5 forall r = (v, i, t, e) Ref(W) do 6 Apply r to W to create a new witness tree R 7 R Refine (R, (E, λ), s) 8 if R = then return R |
| Open Source Code | Yes | Code, Data and Experimental Results https://doi.org/10.5281/zenodo.11235017 |
| Open Datasets | Yes | We tested the solvers on standard benchmark data sets from the Penn Machine Learning Benchmarks (Romano et al. 2022). The data sets are part of the Penn Machine Learning Benchmarks (Romano et al. 2022). |
| Dataset Splits | Yes | Following Janota and Morgado (2020), we randomly sampled multiple subsets of the examples from each data set. Specifically, for each data set we chose 10 random subsets with 20% of the examples and 10 random subsets with 50% of the examples. |
| Hardware Specification | Yes | Our experiments were performed on servers with 24 GB RAM and two Intel(R) Xeon(R) E5540 CPUs with 2.53 GHz, 4 cores, 8 threads, running Java openjdk 11.0.19. Each individual experiment was allowed to use up to 12 GB RAM. All algorithms were executed on a single core. |
| Software Dependencies | Yes | running Java openjdk 11.0.19. We implemented our algorithm in Kotlin (see Zenodo). To solve the LP-relaxation of the Pair LB we used Gurobi 10.0.3 (Gurobi Optimization, LLC 2023). |
| Experiment Setup | No | No specific hyperparameters (e.g., learning rate, batch size, number of epochs, optimizer settings) for the algorithm or training process are explicitly provided in the main text. The paper describes the experimental environment and data splitting, but not specific configuration parameters for the proposed algorithm's operation beyond its structural components. |