reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Leveraging Predictive Equivalence in Decision Trees

Authors: Hayden Mctavish, Zachery Boner, Jon Donnelly, Margo Seltzer, Cynthia Rudin

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply our representation to several downstream machine learning tasks. Using our representation, we show that decision trees are surprisingly robust to test-time missingness of feature values; we address predictive equivalence s impact on quantifying variable importance; and we present an algorithm to optimize the cost of reaching predictions. We demonstrate empirically that decision trees rarely require additional missingness handling to predict on samples with missing data. In Figure 6, we introduce synthetic missingness (Missing Completely at Random) to a variety of real-world datasets by independently removing each feature of each sample with probability p.
Researcher Affiliation	Academia	1Department of Computer Science, Duke University, Durham, North Carolina, USA 2Department of Computer Science, University of British Columbia, Vancouver, BC, Canada.
Pseudocode	Yes	Algorithm 1 Compute DNF Representation from Tree, Algorithm 2 Prediction with the DNF representation, Algorithm 3 Equality Checking, Algorithm 4 BCF, Algorithm 5 Adjusted Quine-Mc Cluskey (with particular processing of the data structures involved to allow for proof of Theorem 3.4), Algorithm 6 Initialization of Q-Learner Using Path Traversal
Open Source Code	Yes	Code for our algorithms and experiments can be found at https://github.com/Hayden Mc T/predictiveequivalence
Open Datasets	Yes	We consider four datasets throughout this work and eight additional datasets in Appendix C. We refer to the primary four as COMPAS (Larson et al., 2016), Wine Quality (Cortez et al., 2009), Wisconsin (Street et al., 1993), and Coupon (Wang et al., 2017).
Dataset Splits	Yes	We compute the Rashomon set of decision trees for the COMPAS, Coupon, Wine Quality, and Wisconsin datasets, and compare the total number of decision trees in each set to the number of unique DNF forms within each set. ... We present this measure of Rashomon set size averaged over 5 folds of each dataset. ... For the Missing Data Section, we split datasets into 5 folds, and binarized according to Threshold Guessing from (Mc Tavish et al., 2022), using the thresholds selected based on 40 boosted decision stumps.
Hardware Specification	No	The paper does not explicitly describe any specific hardware used for running its experiments, such as GPU models, CPU models, or cloud computing instance types. It only mentions computational constraints in general terms.
Software Dependencies	No	The paper mentions using "SKLearn s Decision Tree implementation (Pedregosa et al., 2011)", "Tree FARMS (Xin et al., 2022)", GOSDT (Lin et al., 2020) and dl85 (Aglin et al., 2020) but does not provide specific version numbers for these software components or any other libraries.
Experiment Setup	Yes	We use Tree FARMS (Xin et al., 2022) with maximum depth 3 and a standard per-leaf penalty of 0.01, identifying all trees within 0.02 of the optimal training objective. ... In our Q-learner, we used a discount factor of 0.9, a learning rate of 0.1, and an exploration rate of 0.5, with each term defined as in (Watkins, 1989).