A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees
Authors: Haoran Zhu, Pavankumar Murali, Dzung Phan, Lam Nguyen, Jayant Kalagnanam
NeurIPS 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using 36 data-sets from the University of California Irvine Machine Learning Repository, we demonstrate that our formulation outperforms its counterparts in the literature by an average of about 10% in terms of mean out-of-sample testing accuracy across the data-sets. |
| Researcher Affiliation | Industry | Haoran Zhu, Pavankumar Murali, Dzung T. Phan, Lam M. Nguyen, Jayant R. Kalagnanam IBM Research, Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA EMAIL, EMAIL, EMAIL, Lam EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 LP-based data-selection in each cluster N |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | For benchmarking, we use data-sets from the UCI Machine Learning Repository [9]. |
| Dataset Splits | Yes | We used the same training-testing split as in [4], which is 75% of the entire data-set as training set, and the rest 25% as testing set, with 5-fold cross-validation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU/GPU models, memory, or cloud instance types). |
| Software Dependencies | Yes | We implemented all these MIP approaches in Python 3.6 and solved them using CPLEX 12.9.0 [8]. We invoked the Decision Tree Classifier implementation from scikit-learn to train a decision tree using CART, using default parameter settings. |
| Experiment Setup | Yes | The time limit was set to be 15 or 30 minutes for medium-sized data-sets, and for larger data-sets we increased it up to 4 hours to ensure that the solver could make sufficient progress. For all methods, the maximum tree depth was set to be the same as our SVM1-ODT. For solving S1O and OCT-H, we use the decision tree trained using CART as a warm start solution for CPLEX, as in [4, 24]. In Algorithm 1, T( λ) is the transformation we used in the proof of Theorem 4 to construct a feasible solution for (6), and β1, β2 are some pre-given threshold parameters. |