Sensitivity Verification for Additive Decision Tree Ensembles
Authors: Arhaan Ahmad, Tanay Tayal, Ashutosh Gupta, S. Akshay
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show the practical utility of our approach and its improved performance compared to existing approaches. Next, we provide a novel encoding of the problem using pseudo-Boolean constraints. Based on this encoding, we develop a tunable algorithm to perform sensitivity analysis, which can trade off precision for running time. We implement our algorithm and study its performance on a suite of GBDT benchmarks from the literature. Our experiments show the practical utility of our approach and its improved performance compared to existing approaches. |
| Researcher Affiliation | Academia | Arhaan Ahmad, Tanay V. Tayal, Ashutosh Gupta & S. Akshay Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Mumbai, India. EMAIL |
| Pseudocode | No | The paper describes a novel encoding of the sensitivity problem using pseudo-Boolean constraints and an algorithm developed based on this encoding. It explains the steps for encoding inputs, trees, and outputs through mathematical constraints. However, there is no clearly labeled figure, block, or section explicitly titled "Pseudocode" or "Algorithm" presenting these steps in a structured, code-like format. |
| Open Source Code | Yes | In this section, we present our tool, SENSPB2, which implements the above method for p-sensitivity checking. The tool is developed in Python and utilizes Z3 (de Moura & Bjørner, 2008) as its backend pseudo-Boolean solver. ... 2https://github.com/Arhaan/Sens PB |
| Open Datasets | Yes | To assess our method, we begin by running our tool on a set of XGBoost models from Chen et al. (2019b). Additionally, to evaluate the performance of our tool, we train XGBoost models with varying numbers of ensemble trees on 100,000 randomly generated data samples. ... Table 1: Times taken for verifying or countering sensitivity of all singular feature sets. The Min, Max and Averages in SENSPB times are taken by running the tool with different features of the benchmark tree ensembles as the sensitive feature. More information on these experiments is available in Appendix B. |
| Dataset Splits | No | The paper mentions using "a set of XGBoost models from Chen et al. (2019b)" and training "XGBoost models with varying numbers of ensemble trees on 100,000 randomly generated data samples." While these indicate the datasets used for training or benchmarks, the paper does not specify how these datasets are split into training, validation, or test sets for the experiments conducted in this paper, nor does it refer to predefined standard splits with citations that include such details. |
| Hardware Specification | Yes | We ran the experiments on an Ubuntu machine with 20 1.3GHz cores, which has 64GB RAM. |
| Software Dependencies | No | The tool is developed in Python and utilizes Z3 (de Moura & Bjørner, 2008) as its backend pseudo-Boolean solver. ... We used our own implementation of the SMT-based approach with Z3 (de Moura & Bjørner, 2008) as the SMT solver. While Python is mentioned as the development language and Z3 as a solver, specific version numbers for Python, Z3, or any other libraries or frameworks are not provided. |
| Experiment Setup | Yes | In our experiments, we have set gap p = 0.15 and precision α = 10 |#Trees|. For the SMT solver-based approach, our experimental setup is the same as SENSPB and we report the average time taken. More experiments can be found in Appendix C. |