A True-to-the-model Axiomatic Benchmark for Graph-based Explainers

Authors: Corrado Monti, Paolo Bajardi, Francesco Bonchi, André Panisson, Alan Perotti

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply this framework to both synthetic and real data and evaluate various state-of-the-art explainers, thus characterizing their behavior. Our findings highlight how explainers often react in a rather counter-intuitive fashion to technical details that might be easily overlooked. Our approach offers valuable insights and recommended practices for selecting the right explainer given the task at hand, and for developing new methods for explaining graph-learning models.
Researcher Affiliation Academia Corrado Monti EMAIL CENTAI Institute, Turin, Italy Paolo Bajardi EMAIL CENTAI Institute, Turin, Italy Francesco Bonchi EMAIL CENTAI Institute, Turin, Italy André Panisson EMAIL CENTAI Institute, Turin, Italy Alan Perotti EMAIL CENTAI Institute, Turin, Italy
Pseudocode No The paper describes algorithms and models in text and refers to PyTorch implementations but does not include any clearly labeled pseudocode or algorithm blocks. For example, the formal proof of Theorem 1 is provided in Appendix A, but this is a mathematical proof, not pseudocode.
Open Source Code Yes We provide code to let other researchers use our framework to test or to develop new graph explainers. Our code is available at https://github.com/corradomonti/axiomatic-g-xai.
Open Datasets Yes Finally, we also test a real-world dataset (He & Mc Auley, 2016) with 786 anonymized Facebook users (with 319 binary features) and 14024 edges between.
Dataset Splits No The paper mentions generating synthetic data and testing explainers on a "set of test nodes" for both synthetic and real-world datasets. However, it does not specify explicit training/validation/test splits, percentages, or methodology for partitioning the data in a conventional sense for model training, as the white-box models are not trained. For the evaluation of explainers, it states "Run the explainer E on M for a set of test nodes v V" but does not specify how these test nodes are selected from a larger set or if any portion is reserved for other purposes.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU specifications, or memory used for running the experiments. It focuses on the methodology and results without specifying the underlying computational resources.
Software Dependencies No The paper mentions "Py Torch implementation" multiple times for its white-box models and refers to "Captum" for Integrated Gradients and "Graph LIME" implementation. However, it does not provide specific version numbers for PyTorch, Captum, or any other software libraries or frameworks used, which are necessary for reproducible software dependencies.
Experiment Setup Yes Datasets. Although our white boxes are not trained, our framework still needs data. For our purposes, we use an array of Erdős Rényi graph with 100 nodes and a set of random 50 binary features for each node. We generate an array of such data sets by varying the fraction of positive features. We opted for this kind of random graph in order to test the explainers on a networked system with topological features drastically different from the real data set. In our experiments, we randomly select a given fraction of features as important (varying among experiments) for the model M, and we set the value of γ to 1. Fixes and warnings. ...k being a parameter, that we set to 2. Fixes and warnings. ...In this work the default values of hyperparameters are used across all experiments, i.e., the regularization hyperparameters for subgraph size is 0.005 and for feature explanation is 1.