Explaining Node Embeddings

Authors: Zohair Shafi, Ayan Chatterjee, Tina Eliassi-Rad

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test XM on a variety of real-world graphs and show that XM not only preserves the performance of existing node embedding methods, but also enhances their explainability. We provide an ablation study to analyze the impact of each constraint. XM outputs explainable node embeddings that are tested on a variety of real-world graphs, achieving downstream task performances comparable to the state of the art.
Researcher Affiliation Academia Zohair Shafi EMAIL Northeastern University Boston, MA, USA Ayan Chatterjee EMAIL Northeastern University Boston, MA, USA Tina Eliassi-Rad EMAIL Northeastern University Boston, MA, USA
Pseudocode No The paper describes mathematical formulations for loss functions in equations (2), (3), (4), and (5) but does not contain a dedicated section, figure, or block explicitly labeled as 'Pseudocode' or 'Algorithm' with structured steps.
Open Source Code Yes (Our choices for the hyperparameters are available in our online code repository at https://github.com/zohairshafi/Explaining Node Embeddings)
Open Datasets Yes Table 3: Real-world networks used in our experiments. ... EU Email Leskovec et al. (2007)... US Airport Zhu et al. (2021)... Squirrel Rozemberczki et al. (2021)... Citeseer Bollacker et al. (1998)... FB15K-237 Toutanova & Chen (2015)... Pub Med Roberts (2001)
Dataset Splits Yes We use link prediction as our downstream task and sample an equal number of positive and negative edges from the graph. We use 60% of the edges as training data and the remaining as test data. The embeddings are passed through a simple 2 layer fully connected neural network. We repeat the entire process 3 times to create a three-fold cross-validation setup.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory configurations) used for running the experiments. It only mentions 'runtimes per epoch' without specifying the underlying hardware.
Software Dependencies No The paper does not explicitly state the specific versions of software dependencies, programming languages, or libraries (e.g., Python, PyTorch, TensorFlow, scikit-learn, CUDA) used to implement the methodology or run the experiments.
Experiment Setup Yes We use 128-dimensional embeddings across all algorithms and networks for consistency. We embed every dataset into 128 dimensions for consistency across datasets and algorithms and pass in sense features as node attributes when running DGI and GMI. For this study, we embed the network into 32 dimensions using DGI+XM for brevity. We train a 2-layer GCN with 32 hidden units each for node classification and apply the explainers post-hoc to generate node feature importance scores. (Our choices for the hyperparameters are available in our online code repository at https://github.com/zohairshafi/Explaining Node Embeddings)