reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MurTree: Optimal Decision Trees via Dynamic Programming and Search

Authors: Emir Demirović, Anna Lukina, Emmanuel Hebrard, Jeffrey Chan, James Bailey, Christopher Leckie, Kotagiri Ramamohanarao, Peter J. Stuckey

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show in a detailed experimental study that our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances, providing several orders of magnitude improvements and notably contributing towards the practical use of optimal decision trees.
Researcher Affiliation	Academia	Emir Demirović EMAIL Anna Lukina EMAIL Delft University of Technology Delft, The Netherlands Emmanuel Hebrard EMAIL LAAS CNRS Toulouse, France Jeﬀrey Chan EMAIL RMIT University Melbourne, Australia James Bailey EMAIL Christopher Leckie EMAIL Kotagiri Ramamohanarao EMAIL University of Melbourne Melbourne, Australia Peter J. Stuckey EMAIL Monash University and DATA61 Melbourne, Australia
Pseudocode	Yes	Algorithm 1: Mur Tree.Solve Subtree(D, B, d, n, UB), the main algorithm loop Algorithm 2: Mur Tree.General Case(D, B, d, n, UB), the general (fourth) case of Eq. 1 used in Algorithm 1 Algorithm 3: Mur Tree.Solve Subtree Given Root Feature(D, B, froot, d, n L, n R, UB): a subroutine used as part of Algorithm 2 Algorithm 4: Specialised algorithm for computing optimal classiﬁcation trees of depth two with three nodes Algorithm 6: Mur Tree.Solve Sparse Objective(D, d, n, α), an algorithm to minimise the sparse objective (Eq.19) Algorithm 7: Mur Tree.Solve Subtree Lexicographically(D, d, n), an algorithm to compute the tree with minimum misclassiﬁcations using the least amount of nodes
Open Source Code	Yes	Our code, binarised datasets, and the binarisation script are available online: https: //bitbucket.org/Emir D/murtree.
Open Datasets	Yes	We use publicly available datasets used in previous works (Bertsimas and Dunn (2017); Verwer and Zhang (2019); Narodytska et al. (2018); Aglin et al. (2020a); Hu et al. (2019)), most of which are available from the UCI and CP4IM repositories. ... Our code, binarised datasets, and the binarisation script are available online: https: //bitbucket.org/Emir D/murtree.
Dataset Splits	Yes	Given a model, we compute the average train and test accuracy using stratiﬁed ﬁve-fold cross-validation for each combination of parameters.
Hardware Specification	Yes	Experiments were run on an Intel i-7-8550U CPU with 32 GB of RAM running one algorithm at a time using one processor.
Software Dependencies	No	The implementation of MDLP from the R programming package was used (Kim (2015)). ... We use the algorithms provided by the sklearn (Pedregosa et al. (2011)) Python package for machine learning for the other methods.
Experiment Setup	Yes	Hyper-parameter tuning is performed for all methods using grid search. Given a model, we compute the average train and test accuracy using stratiﬁed ﬁve-fold cross-validation for each combination of parameters. The set of parameters that leads to the best test accuracy is selected. Note that the model is trained on the training sets, but evaluated on the test sets. The runtime presented includes the time taken to perform cross-fold validation using all parameters and the time to train a new decision using the selected parameters on the complete dataset. All methods and parameter conﬁgurations used exactly the same folds.