Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Leveraging Predictive Equivalence in Decision Trees

Authors: Hayden Mctavish, Zachery Boner, Jon Donnelly, Margo Seltzer, Cynthia Rudin

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply our representation to several downstream machine learning tasks. Using our representation, we show that decision trees are surprisingly robust to test-time missingness of feature values; we address predictive equivalence s impact on quantifying variable importance; and we present an algorithm to optimize the cost of reaching predictions. We demonstrate empirically that decision trees rarely require additional missingness handling to predict on samples with missing data. In Figure 6, we introduce synthetic missingness (Missing Completely at Random) to a variety of real-world datasets by independently removing each feature of each sample with probability p.
Researcher Affiliation Academia 1Department of Computer Science, Duke University, Durham, North Carolina, USA 2Department of Computer Science, University of British Columbia, Vancouver, BC, Canada.
Pseudocode Yes Algorithm 1 Compute DNF Representation from Tree, Algorithm 2 Prediction with the DNF representation, Algorithm 3 Equality Checking, Algorithm 4 BCF, Algorithm 5 Adjusted Quine-Mc Cluskey (with particular processing of the data structures involved to allow for proof of Theorem 3.4), Algorithm 6 Initialization of Q-Learner Using Path Traversal
Open Source Code Yes Code for our algorithms and experiments can be found at https://github.com/Hayden Mc T/predictiveequivalence
Open Datasets Yes We consider four datasets throughout this work and eight additional datasets in Appendix C. We refer to the primary four as COMPAS (Larson et al., 2016), Wine Quality (Cortez et al., 2009), Wisconsin (Street et al., 1993), and Coupon (Wang et al., 2017).
Dataset Splits Yes We compute the Rashomon set of decision trees for the COMPAS, Coupon, Wine Quality, and Wisconsin datasets, and compare the total number of decision trees in each set to the number of unique DNF forms within each set. ... We present this measure of Rashomon set size averaged over 5 folds of each dataset. ... For the Missing Data Section, we split datasets into 5 folds, and binarized according to Threshold Guessing from (Mc Tavish et al., 2022), using the thresholds selected based on 40 boosted decision stumps.
Hardware Specification No The paper does not explicitly describe any specific hardware used for running its experiments, such as GPU models, CPU models, or cloud computing instance types. It only mentions computational constraints in general terms.
Software Dependencies No The paper mentions using "SKLearn s Decision Tree implementation (Pedregosa et al., 2011)", "Tree FARMS (Xin et al., 2022)", GOSDT (Lin et al., 2020) and dl85 (Aglin et al., 2020) but does not provide specific version numbers for these software components or any other libraries.
Experiment Setup Yes We use Tree FARMS (Xin et al., 2022) with maximum depth 3 and a standard per-leaf penalty of 0.01, identifying all trees within 0.02 of the optimal training objective. ... In our Q-learner, we used a discount factor of 0.9, a learning rate of 0.1, and an exploration rate of 0.5, with each term defined as in (Watkins, 1989).