Explanation Shift: How Did the Distribution Shift Impact the Model?
Authors: Carlos Mougan, Klaus Broelemann, Gjergji Kasneci, Thanassis Tiropanis, Steffen Staab
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide theoretical and experimental evidence and demonstrate the effectiveness of our approach on synthetic and real data. Additionally, we release an open-source Python package, skshift, which implements our method and provides usage tutorials for further reproducibility. |
| Researcher Affiliation | Collaboration | Carlos Mougan AI Office European Commission & University of Southampton. Klaus Broelemann Schufa Holding AG, Germany Gjergji Kasneci Schufa Holding AG & Technical University of Munich Thanassis Tiropanis University of Southampton Steffen Staab University of Stuttgart & University of Southampton |
| Pseudocode | No | The paper describes methods and uses mathematical formulations but does not contain a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | Additionally, we release an open-source Python package, skshift, which implements our method and provides usage tutorials for further reproducibility. To ensure reproducibility, we make the data, code repositories, and experiments publicly available https://github.com/cmougan/Explanation Shift. Also, an open-source Python package skshift, available at: https://skshift.readthedocs.io/ |
| Open Datasets | Yes | In the main body of the paper we base our comparisons on the UCI Adult Income dataset Dua & Graff (2017) and on synthetic data. In the Appendix, we extend experiments to several other datasets, which confirm our findings: ACS Travel Time, ACS Employment, Stackoverflow dataset (Stackoverflow, 2019). |
| Dataset Splits | Yes | The model gψ is trained each time on each state using only the Dnew X in the absence of the label, and a 50/50 random train-test split evaluates its performance. |
| Hardware Specification | Yes | Experiments were run on a 4 v CPU server with 32 GB RAM. |
| Software Dependencies | Yes | We used shap version 0.41.0 and lime version 0.2.0.1 as software packages. |
| Experiment Setup | Yes | We train the fθ on Dtr,ρ=0 using a gradient-boosted decision tree, while for gψ : S(fθ, Dval,ρ X ) {0, 1}, we train on different datasests with different values of ρ. For gψ we use a logistic regression. In this experiment, we changed the hyperparameters of the original model: for the decision tree, we varied the depth of the tree, while for the gradient-boosted decision trees, we changed the number of estimators, and for the random forest, both hyperparameters. |