reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A True-to-the-model Axiomatic Benchmark for Graph-based Explainers

Authors: Corrado Monti, Paolo Bajardi, Francesco Bonchi, André Panisson, Alan Perotti

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply this framework to both synthetic and real data and evaluate various state-of-the-art explainers, thus characterizing their behavior. Our findings highlight how explainers often react in a rather counter-intuitive fashion to technical details that might be easily overlooked. Our approach offers valuable insights and recommended practices for selecting the right explainer given the task at hand, and for developing new methods for explaining graph-learning models.
Researcher Affiliation	Academia	Corrado Monti EMAIL CENTAI Institute, Turin, Italy Paolo Bajardi EMAIL CENTAI Institute, Turin, Italy Francesco Bonchi EMAIL CENTAI Institute, Turin, Italy André Panisson EMAIL CENTAI Institute, Turin, Italy Alan Perotti EMAIL CENTAI Institute, Turin, Italy
Pseudocode	No	The paper describes algorithms and models in text and refers to PyTorch implementations but does not include any clearly labeled pseudocode or algorithm blocks. For example, the formal proof of Theorem 1 is provided in Appendix A, but this is a mathematical proof, not pseudocode.
Open Source Code	Yes	We provide code to let other researchers use our framework to test or to develop new graph explainers. Our code is available at https://github.com/corradomonti/axiomatic-g-xai.
Open Datasets	Yes	Finally, we also test a real-world dataset (He & Mc Auley, 2016) with 786 anonymized Facebook users (with 319 binary features) and 14024 edges between.
Dataset Splits	No	The paper mentions generating synthetic data and testing explainers on a "set of test nodes" for both synthetic and real-world datasets. However, it does not specify explicit training/validation/test splits, percentages, or methodology for partitioning the data in a conventional sense for model training, as the white-box models are not trained. For the evaluation of explainers, it states "Run the explainer E on M for a set of test nodes v V" but does not specify how these test nodes are selected from a larger set or if any portion is reserved for other purposes.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU specifications, or memory used for running the experiments. It focuses on the methodology and results without specifying the underlying computational resources.
Software Dependencies	No	The paper mentions "Py Torch implementation" multiple times for its white-box models and refers to "Captum" for Integrated Gradients and "Graph LIME" implementation. However, it does not provide specific version numbers for PyTorch, Captum, or any other software libraries or frameworks used, which are necessary for reproducible software dependencies.
Experiment Setup	Yes	Datasets. Although our white boxes are not trained, our framework still needs data. For our purposes, we use an array of Erdős Rényi graph with 100 nodes and a set of random 50 binary features for each node. We generate an array of such data sets by varying the fraction of positive features. We opted for this kind of random graph in order to test the explainers on a networked system with topological features drastically different from the real data set. In our experiments, we randomly select a given fraction of features as important (varying among experiments) for the model M, and we set the value of γ to 1. Fixes and warnings. ...k being a parameter, that we set to 2. Fixes and warnings. ...In this work the default values of hyperparameters are used across all experiments, i.e., the regularization hyperparameters for subgraph size is 0.005 and for feature explanation is 1.