reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sensitivity Verification for Additive Decision Tree Ensembles

Authors: Arhaan Ahmad, Tanay Tayal, Ashutosh Gupta, S. Akshay

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show the practical utility of our approach and its improved performance compared to existing approaches. Next, we provide a novel encoding of the problem using pseudo-Boolean constraints. Based on this encoding, we develop a tunable algorithm to perform sensitivity analysis, which can trade off precision for running time. We implement our algorithm and study its performance on a suite of GBDT benchmarks from the literature. Our experiments show the practical utility of our approach and its improved performance compared to existing approaches.
Researcher Affiliation	Academia	Arhaan Ahmad, Tanay V. Tayal, Ashutosh Gupta & S. Akshay Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Mumbai, India. EMAIL
Pseudocode	No	The paper describes a novel encoding of the sensitivity problem using pseudo-Boolean constraints and an algorithm developed based on this encoding. It explains the steps for encoding inputs, trees, and outputs through mathematical constraints. However, there is no clearly labeled figure, block, or section explicitly titled "Pseudocode" or "Algorithm" presenting these steps in a structured, code-like format.
Open Source Code	Yes	In this section, we present our tool, SENSPB2, which implements the above method for p-sensitivity checking. The tool is developed in Python and utilizes Z3 (de Moura & Bjørner, 2008) as its backend pseudo-Boolean solver. ... 2https://github.com/Arhaan/Sens PB
Open Datasets	Yes	To assess our method, we begin by running our tool on a set of XGBoost models from Chen et al. (2019b). Additionally, to evaluate the performance of our tool, we train XGBoost models with varying numbers of ensemble trees on 100,000 randomly generated data samples. ... Table 1: Times taken for verifying or countering sensitivity of all singular feature sets. The Min, Max and Averages in SENSPB times are taken by running the tool with different features of the benchmark tree ensembles as the sensitive feature. More information on these experiments is available in Appendix B.
Dataset Splits	No	The paper mentions using "a set of XGBoost models from Chen et al. (2019b)" and training "XGBoost models with varying numbers of ensemble trees on 100,000 randomly generated data samples." While these indicate the datasets used for training or benchmarks, the paper does not specify how these datasets are split into training, validation, or test sets for the experiments conducted in this paper, nor does it refer to predefined standard splits with citations that include such details.
Hardware Specification	Yes	We ran the experiments on an Ubuntu machine with 20 1.3GHz cores, which has 64GB RAM.
Software Dependencies	No	The tool is developed in Python and utilizes Z3 (de Moura & Bjørner, 2008) as its backend pseudo-Boolean solver. ... We used our own implementation of the SMT-based approach with Z3 (de Moura & Bjørner, 2008) as the SMT solver. While Python is mentioned as the development language and Z3 as a solver, specific version numbers for Python, Z3, or any other libraries or frameworks are not provided.
Experiment Setup	Yes	In our experiments, we have set gap p = 0.15 and precision α = 10 \|#Trees\|. For the SMT solver-based approach, our experimental setup is the same as SENSPB and we report the average time taken. More experiments can be found in Appendix C.