reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Near-Optimal Decision Trees in a SPLIT Second

Authors: Varun Babbar, Hayden Mctavish, Cynthia Rudin, Margo Seltzer

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results show dramatic improvements in runtime compared to state of the art algorithms, with negligible loss in accuracy or sparsity. For all experiments below we set the depth budget of our algorithms to 5. The lookahead depth for Algorithm 2 is set to 2 since, from Corollary 6.2, this produces the lowest runtime for the chosen depth budget. We defer more details of our experimental setup and datasets to Appendix A.7.4. Appendix A.3 has additional evaluations of our methods.
Researcher Affiliation	Academia	1Department of Computer Science, Duke University, Durham, USA 2Department of Computer Science, University of British Columbia, Vancouver, Canada. Correspondence to: Varun <EMAIL>, Hayden <EMAIL>.
Pseudocode	Yes	Algorithm 1 get bounds(D, dl, d, d , N) lb, ub... Algorithm 2 SPLIT(ℓ, D, λ, dl, d, p)... Algorithm 3 Lickety SPLIT(ℓ, D, λ, d)... Algorithm 4 Greedy(D, d, λ) (tgreedy, lb)... Algorithm 5 RESPLIT(ℓ, D, λ, dl, d)...
Open Source Code	Yes	Code for our algorithms and experiments can be found at https://github.com/VarunBabbar/SPLIT-ICML.
Open Datasets	Yes	The Home Equity Line of Credit (HELOC) (FICO, 2018) dataset used for the Explainable ML Challenge... The Bike dataset (Fanaee-T & Gama, 2013)... FICO. Home equity line of credit (heloc) dataset. https://community.fico.com/s/ explainable-machine-learning-challenge, 2018. FICO Explainable Machine Learning Challenge. Fanaee-T, H. and Gama, J. Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence, pp. 1–15, 2013.
Dataset Splits	Yes	All datasets were evaluated on three random 80-20 train-test splits of the data, with the average and standard error reported. All metrics are averaged over 3 test-train splits.
Hardware Specification	Yes	All experiments were performed on an institutional computing cluster. This was a single Intel(R) Xeon(R) Gold 6226 machine with a 2.70GHz CPU. It has 300GB RAM and 24 cores.
Software Dependencies	No	The paper mentions using the standard 'scikit-learn Decision Tree Classifier class' and building upon the 'GOSDT' and 'Tree FARMS' codebase. However, specific version numbers for these or any other key software dependencies (like Python, PyTorch, etc.) are not provided in the text.
Experiment Setup	Yes	For all experiments below we set the depth budget of our algorithms to 5. The lookahead depth for Algorithm 2 is set to 2... We vary the sparsity penalty, λ, which is a common input to all algorithms in this figure, and compute the regularized test objective from Equation 1 for each value of λ. We set a timeout limit of 1200 seconds for GOSDT, after which it gives the best solution found so far.