reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Reconciling Privacy and Explainability in High-Stakes: A Systematic Inquiry

Authors: Supriya Manna, Niladri Sett

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments and report our findings: we consider three types of widely used CNN models, six distinct ϵ values to train them on, and five popular post-hoc methods for our experiment; We report our findings for the three networks: Dense Net-121, Res Net-34, and Efficient Net-v2 in Figure 2, 1, and 3 respectively.
Researcher Affiliation	Academia	Supriya Manna EMAIL SRM University AP, India Niladri Sett EMAIL SRM University AP, India
Pseudocode	No	The paper describes methods and mathematical formulations but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	We will release the weights of these models upon publication. The paper mentions utilizing several third-party libraries (Opacus, Captum, Simtorch, Py RKHSstats) which are publicly available, but does not provide specific access to the authors' own implementation code for the methodology described in this paper.
Open Datasets	Yes	Our dataset comprises 2,000 Pneumonia cases sourced from the Chest X-ray dataset by (Patel, 2020), and 2,000 TB cases randomly sampled from the NIAID TB Portal Program dataset (National Institute of Allergy and Infectious Diseases). To create the Normal subset, we include an equal split of 1,000 unaffected Pneumonia samples from the unaffected class in the aforementioned Chest X-ray dataset (Patel, 2020) and 1,000 unaffected TB samples from Rahman et al. (Rahman et al., 2020), totaling 2,000 normal cases. However, after our primary experiment, we, within the limits of feasibility, experimented with another benchmark dataset: CIFAR-10
Dataset Splits	No	For evaluation, the test set contains 200 images from each class. The paper specifies the test set size but does not provide explicit training and validation splits or methodology for the Chest X-ray/TB dataset, nor explicit splits for CIFAR-10 beyond mentioning
Hardware Specification	Yes	We run all our experiments on an NVIDIA DGX workstation, leveraging 1 Tesla V100 32GB GPU.
Software Dependencies	No	We wrote all experiments in Python 3.10. We utilized the Opacus library for DP-training (https://opacus.ai). We employ the off-the-shelf, publicly available implementations of the explainers from Captum library (Kokhlikyan et al., 2020). For (d)CKA, we utilized the publicly available package: Simtorch (https://github.com/ykumards/simtorch) with default (hyper)parameter selection. For Statistical testing with HSIC, we utilized the publicly available package: Py RKHSstats (https://github.com/Black-Swan-ICL/Py RKHSstats) with default (hyper)parameter selection except for the default p-value cutoff of 0.01. The paper mentions Python 3.10, but does not specify version numbers for Opacus, Captum, Simtorch, or Py RKHSstats.
Experiment Setup	Yes	We train the non-private and private models fixing all the hyperparameters (batch size: 128, lr: 0.001, delta (for DP): 0.001) except for the number of epochs, as private models need more computation to learn due to the heavy regularization DP introduces in the training (Ponomareva et al., 2023). Following (Ponomareva et al., 2023), we initialised all our models (both non-private and private counterparts) with publicly available pre-trained weights (Image Net) for better convergence. Furthermore, to make a fair comparison we have fixed the number of all hyperparameters in all the private models with different ϵ. We set the no. of epochs as 50 for all private models; it yielded competitive accuracy across model types. We replace the Batch Norm layers with Group Norm layers in all non-private models along with their private counterparts, as Group Norm does not alter the base architecture drastically, scales well, and adheres to the privacy principle strictly.