Reconciling Privacy and Explainability in High-Stakes: A Systematic Inquiry
Authors: Supriya Manna, Niladri Sett
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments and report our findings: we consider three types of widely used CNN models, six distinct ϵ values to train them on, and five popular post-hoc methods for our experiment; We report our findings for the three networks: Dense Net-121, Res Net-34, and Efficient Net-v2 in Figure 2, 1, and 3 respectively. |
| Researcher Affiliation | Academia | Supriya Manna EMAIL SRM University AP, India Niladri Sett EMAIL SRM University AP, India |
| Pseudocode | No | The paper describes methods and mathematical formulations but does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | We will release the weights of these models upon publication. The paper mentions utilizing several third-party libraries (Opacus, Captum, Simtorch, Py RKHSstats) which are publicly available, but does not provide specific access to the authors' own implementation code for the methodology described in this paper. |
| Open Datasets | Yes | Our dataset comprises 2,000 Pneumonia cases sourced from the Chest X-ray dataset by (Patel, 2020), and 2,000 TB cases randomly sampled from the NIAID TB Portal Program dataset (National Institute of Allergy and Infectious Diseases). To create the Normal subset, we include an equal split of 1,000 unaffected Pneumonia samples from the unaffected class in the aforementioned Chest X-ray dataset (Patel, 2020) and 1,000 unaffected TB samples from Rahman et al. (Rahman et al., 2020), totaling 2,000 normal cases. However, after our primary experiment, we, within the limits of feasibility, experimented with another benchmark dataset: CIFAR-10 |
| Dataset Splits | No | For evaluation, the test set contains 200 images from each class. The paper specifies the test set size but does not provide explicit training and validation splits or methodology for the Chest X-ray/TB dataset, nor explicit splits for CIFAR-10 beyond mentioning |
| Hardware Specification | Yes | We run all our experiments on an NVIDIA DGX workstation, leveraging 1 Tesla V100 32GB GPU. |
| Software Dependencies | No | We wrote all experiments in Python 3.10. We utilized the Opacus library for DP-training (https://opacus.ai). We employ the off-the-shelf, publicly available implementations of the explainers from Captum library (Kokhlikyan et al., 2020). For (d)CKA, we utilized the publicly available package: Simtorch (https://github.com/ykumards/simtorch) with default (hyper)parameter selection. For Statistical testing with HSIC, we utilized the publicly available package: Py RKHSstats (https://github.com/Black-Swan-ICL/Py RKHSstats) with default (hyper)parameter selection except for the default p-value cutoff of 0.01. The paper mentions Python 3.10, but does not specify version numbers for Opacus, Captum, Simtorch, or Py RKHSstats. |
| Experiment Setup | Yes | We train the non-private and private models fixing all the hyperparameters (batch size: 128, lr: 0.001, delta (for DP): 0.001) except for the number of epochs, as private models need more computation to learn due to the heavy regularization DP introduces in the training (Ponomareva et al., 2023). Following (Ponomareva et al., 2023), we initialised all our models (both non-private and private counterparts) with publicly available pre-trained weights (Image Net) for better convergence. Furthermore, to make a fair comparison we have fixed the number of all hyperparameters in all the private models with different ϵ. We set the no. of epochs as 50 for all private models; it yielded competitive accuracy across model types. We replace the Batch Norm layers with Group Norm layers in all non-private models along with their private counterparts, as Group Norm does not alter the base architecture drastically, scales well, and adheres to the privacy principle strictly. |