Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Leveraging Predictive Equivalence in Decision Trees
Authors: Hayden Mctavish, Zachery Boner, Jon Donnelly, Margo Seltzer, Cynthia Rudin
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our representation to several downstream machine learning tasks. Using our representation, we show that decision trees are surprisingly robust to test-time missingness of feature values; we address predictive equivalence s impact on quantifying variable importance; and we present an algorithm to optimize the cost of reaching predictions. We demonstrate empirically that decision trees rarely require additional missingness handling to predict on samples with missing data. In Figure 6, we introduce synthetic missingness (Missing Completely at Random) to a variety of real-world datasets by independently removing each feature of each sample with probability p. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Duke University, Durham, North Carolina, USA 2Department of Computer Science, University of British Columbia, Vancouver, BC, Canada. |
| Pseudocode | Yes | Algorithm 1 Compute DNF Representation from Tree, Algorithm 2 Prediction with the DNF representation, Algorithm 3 Equality Checking, Algorithm 4 BCF, Algorithm 5 Adjusted Quine-Mc Cluskey (with particular processing of the data structures involved to allow for proof of Theorem 3.4), Algorithm 6 Initialization of Q-Learner Using Path Traversal |
| Open Source Code | Yes | Code for our algorithms and experiments can be found at https://github.com/Hayden Mc T/predictiveequivalence |
| Open Datasets | Yes | We consider four datasets throughout this work and eight additional datasets in Appendix C. We refer to the primary four as COMPAS (Larson et al., 2016), Wine Quality (Cortez et al., 2009), Wisconsin (Street et al., 1993), and Coupon (Wang et al., 2017). |
| Dataset Splits | Yes | We compute the Rashomon set of decision trees for the COMPAS, Coupon, Wine Quality, and Wisconsin datasets, and compare the total number of decision trees in each set to the number of unique DNF forms within each set. ... We present this measure of Rashomon set size averaged over 5 folds of each dataset. ... For the Missing Data Section, we split datasets into 5 folds, and binarized according to Threshold Guessing from (Mc Tavish et al., 2022), using the thresholds selected based on 40 boosted decision stumps. |
| Hardware Specification | No | The paper does not explicitly describe any specific hardware used for running its experiments, such as GPU models, CPU models, or cloud computing instance types. It only mentions computational constraints in general terms. |
| Software Dependencies | No | The paper mentions using "SKLearn s Decision Tree implementation (Pedregosa et al., 2011)", "Tree FARMS (Xin et al., 2022)", GOSDT (Lin et al., 2020) and dl85 (Aglin et al., 2020) but does not provide specific version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | We use Tree FARMS (Xin et al., 2022) with maximum depth 3 and a standard per-leaf penalty of 0.01, identifying all trees within 0.02 of the optimal training objective. ... In our Q-learner, we used a discount factor of 0.9, a learning rate of 0.1, and an exploration rate of 0.5, with each term defined as in (Watkins, 1989). |