Provably Accurate Shapley Value Estimation via Leverage Score Sampling
Authors: Christopher Musco, R. Teal Witter
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Beyond theoretical guarantees, we find that Leverage SHAP achieves an approximately 50% reduction in error compared to the highly optimized implementation of Kernel SHAP in the widely used SHAP library [Lundberg & Lee, 2017]. More extensive experiments are included in Section 5. We find that the improvement of Leverage SHAP over Kernel SHAP is especially substantial in settings where n is large in comparison to m and when we only have access to noisy estimates of the Shapley values. |
| Researcher Affiliation | Academia | Christopher Musco New York University EMAIL R. Teal Witter New York University EMAIL |
| Pseudocode | Yes | Algorithm 1 Leverage SHAP Algorithm 2 Bernoulli Sample(n, c): Efficient Bernoulli Sampling Algorithm 3 Combo(n, s, i): Compute the ith Combination in Lexicographic Order |
| Open Source Code | Yes | All of our code is written in Python and can be found on Github3. [footnote 3: github.com/rtealwitter/leverageshap] |
| Open Datasets | Yes | We run our experiments on eight popular datasets from the SHAP library (Lundberg & Lee, 2017). We find that Leverage SHAP outperforms the highly optimized Kernel SHAP, achieving a 50% reduction in error on average (see Table 2). In addition, we find that Leverage SHAP consistently outperforms Permutation SHAP (see the additional experiments in Section G). |
| Dataset Splits | No | The paper mentions running experiments on eight popular datasets but does not specify how these datasets were split into training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as GPU or CPU models. |
| Software Dependencies | No | All of our code is written in Python and can be found on Github3. We use the SHAP library for the Optimized Kernel SHAP implementation, Tree SHAP, and the datasets. We use XGBoost for training and evaluating trees, under the default parameters. While Python, SHAP library, and XGBoost are mentioned, specific version numbers for these software components are not provided. |
| Experiment Setup | Yes | We use XGBoost for training and evaluating trees, under the default parameters. Figure 3 plots the performance for Kernel SHAP, the Optimized Kernel SHAP, and Leverage SHAP as the number of samples varies (we set m = 5n, 10n, 20n, . . . , 160n). |