reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Data-centric Prediction Explanation via Kernelized Stein Discrepancy

Authors: Mahtab Sarvmaili, Hassan Sajjad, Ga Wu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct several qualitative and quantitative experiments to demonstrate various properties of HD-Explain and compare it with the existing example-based solutions. Datasets: We consider multiple disease classification tasks where diagnosis explanation is highly desired. We also introduced synthetic and benchmark classification datasets to deliver the main idea without the need for medical background knowledge. Concretely, we use CIFAR-10 (32 32 3), Brain Tumor (Magnetic Resonance Imaging, 128 128 3), Ovarian Cancer (Histopathology Images, 128 128 3) datasets, and SVHN (32 32 3). More details are listed in the Appendix F.
Researcher Affiliation	Academia	Mahtab Sarvmaili, Hassan Sajjad, Ga Wu Department of Computer Science Dalhousie University EMAIL
Pseudocode	Yes	L HD-EXPLAIN: EXPLANATION PROCESS The following algorithm shows the algorithm of HD-Explain in pseudocode. Algorithm 1 HD-Explain
Open Source Code	Yes	Source code is available at https://github.com/Mahtab Sarvmaili/HDExplain.
Open Datasets	Yes	Datasets: We consider multiple disease classification tasks where diagnosis explanation is highly desired. We also introduced synthetic and benchmark classification datasets to deliver the main idea without the need for medical background knowledge. Concretely, we use CIFAR-10 (32 32 3), Brain Tumor (Magnetic Resonance Imaging, 128 128 3), Ovarian Cancer (Histopathology Images, 128 128 3) datasets, and SVHN (32 32 3). More details are listed in the Appendix F. Table 2: Summary of datasets used in the paper. CIFAR-10 Classification Benchmark Image 60,000 32 32 3 10 No Yes Brain Tumor MRI Benchmark Image 7,023 128 128 3 4 Yes Yes
Dataset Splits	No	The paper describes how augmented test points were generated for evaluation, stating "We created 30 augmented test points for each training data point (> 10, 000 data points) in each dataset, resulting in more than 300, 000 independent runs." However, it does not explicitly provide the train/validation/test splits for the original datasets (e.g., CIFAR-10, SVHN) in terms of percentages, counts, or references to predefined splits, beyond mentioning "CIFAR-10 is a small benchmark data with 50000 training samples."
Hardware Specification	Yes	H HARDWARE SETUP We ran all our experiments on a machine equipped with a GTX 1080 Ti GPU, a second-generation Ryzen 5 processor, and 32 GB of memory.
Software Dependencies	No	The paper mentions using "Res Net-18 as the backbone model architecture" but does not specify versions for any programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	No	The paper states "Our experiments use Res Net-18 as the backbone model architecture (with around 11 million trainable parameters) for all image datasets" and mentions data augmentations were conducted "including random cropping, rotation, shifting, horizontal flipping, and noise injection." However, it does not provide specific hyperparameters such as learning rate, batch size, optimizer details, number of epochs, or other system-level training configurations needed to reproduce the experiments.