Explaining the Behavior of Black-Box Prediction Algorithms with Causal Learning
Authors: Numair Sani, Daniel Malinsky, Ilya Shpitser
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a simulation study that highlights key issues and demonstrates the strength of our approach. We apply a version of our proposal to two datasets: annotated image data for bird classification and annotated chest X-ray images for pneumonia detection. ... We conduct two real data experiments to demonstrate the utility of our approach. |
| Researcher Affiliation | Collaboration | Numair Sani EMAIL Sani Analytics, Mumbai, MH India. Daniel Malinsky EMAIL Department of Biostatistics, Columbia University, New York, NY USA. Ilya Shpitser EMAIL Department of Computer Science, Johns Hopkins University, Baltimore, MD USA. |
| Pseudocode | No | The paper describes methods and algorithms narratively but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using third-party tools like TETRAD freeware and implementations of LIME and SHAP, providing links to their repositories. However, it does not provide any concrete access information for the authors' own implementation code or methodology described in the paper. |
| Open Datasets | Yes | First, we study a neural network for bird classification, trained on the Caltech-UCSD 200-2011 image dataset (Wah et al., 2011). ... Second, we follow essentially the same procedure to explain the behavior of a pneumonia detection neural network, trained on a subset of the Chest X-ray8 dataset (Wang et al., 2017a). ... Both data sources are publicly available online. |
| Dataset Splits | Yes | This yields a dataset of 3538 images, which is then partitioned into training, validation, and testing datasets of 2489, 520, and 529 images respectively. ... Using the same architecture as for the previous experiment and reserving 55 images for testing, Res Net18 achieves an accuracy of 74.55%. |
| Hardware Specification | No | The paper mentions the use of ResNet18 architecture and training parameters, but it does not specify any particular hardware (e.g., GPU, CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using the TETRAD package, the sklearn library for logistic regression, and implementations of LIME and SHAP. However, it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | The model is trained for 15 epochs with a batch size of 64 and using the SGD optimizer with a learning rate of 0.01 and a momentum of 0.09. Additionally, we schedule a learning rate decay with a step size of 7 and γ = 0.1. ... We run FCI on each replicate with independence test rejection threshold (a tuning parameter) set to α = .05 and α = .01 for the birds and X-ray datasets, respectively, with the knowledge constraint imposed that outcome b Y cannot cause any of the interpretable features. Here FCI is used with the χ2 independence test, and we limit the maximum conditioning set size to 4 for computational tractability in the birds dataset. |