Optimal Decision Tree Pruning Revisited: Algorithms and Complexity
Authors: Juha Harviainen, Frank Sommer, Manuel Sorge, Stefan Szeider
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a comprehensive classical and parameterized complexity analysis of decision tree pruning operations, extending recent research on the complexity of learning small decision trees. Thereby, we offer new insights into the computational challenges of decision tree simplification, a crucial aspect of developing interpretable and efficient machine learning models. We focus on fundamental pruning operations of subtree replacement and raising, which are used in heuristics. Surprisingly, while optimal pruning can be performed in polynomial time for subtree replacement, the problem is NP-complete for subtree raising. Therefore, we identify parameters and combinations thereof that lead to fixed-parameter tractability or hardness, establishing a precise borderline between these complexity classes. For example, while subtree raising is hard for small domain size D or number d of features, it can be solved in D2d |I|O(1) time, where |I| is the input size. We complement our theoretical findings with preliminary experimental results, demonstrating the practical implications of our analysis. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Helsinki, Helsinki, Finland 2Institute of Logic and Computation, TU Wien, Austria. Correspondence to: Juha Harviainen <EMAIL>, Frank Sommer <EMAIL>, Manuel Sorge <EMAIL>, Stefan Szeider <EMAIL>. |
| Pseudocode | No | The paper describes algorithms using prose and references theorems and proofs, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured, code-like formatting. |
| Open Source Code | Yes | 1A continuously updated version of our paper is available on ar Xiv (Harviainen et al., 2025a) and the related source code for replicating the experiments on Zenodo (Harviainen et al., 2025b). |
| Open Datasets | Yes | We used 40 datasets from the Penn Machine Learning Benchmarks library (Romano et al., 2022), including 32 previously used for minimum-size tree computation (Bessiere et al., 2009; Narodytska et al., 2018; Staus et al., 2025)6 and additional larger datasets. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits or cross-validation details for the experiments conducted by the authors. It mentions using pre-existing datasets and WEKA for generating trees, but doesn't specify how the data was partitioned for their analysis. |
| Hardware Specification | Yes | We ran the implementation under Ubuntu Linux 18.04 on a compute cluster with Intel Xeon E5-2640 processors, setting a maximum RAM limit of 64GB. |
| Software Dependencies | Yes | We computed unpruned and pruned trees using WEKA 3.8.5 s (Frank et al., 2010) C4.5 implementation (Quinlan, 1993)... We implemented a dynamic-programming algorithm for solving DTRAIS= based on Theorem 4.2 in Python, tested with versions 3.6.9 and 3.10.12. |
| Experiment Setup | Yes | The unpruned trees were obtained by running WEKA s J48 classifier with the flags -no-cv -B -M 0 -J -U. We obtained two types of pruned trees: Those obtained by the replacement heuristic implemented in J48 when run with the flags -no-cv -B -M 0 -J -S and those obtained by the raising heuristic, corresponding to the flags -no-cv -B -M 0 -J. |