reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Explaining Node Embeddings

Authors: Zohair Shafi, Ayan Chatterjee, Tina Eliassi-Rad

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test XM on a variety of real-world graphs and show that XM not only preserves the performance of existing node embedding methods, but also enhances their explainability. We provide an ablation study to analyze the impact of each constraint. XM outputs explainable node embeddings that are tested on a variety of real-world graphs, achieving downstream task performances comparable to the state of the art.
Researcher Affiliation	Academia	Zohair Shafi EMAIL Northeastern University Boston, MA, USA Ayan Chatterjee EMAIL Northeastern University Boston, MA, USA Tina Eliassi-Rad EMAIL Northeastern University Boston, MA, USA
Pseudocode	No	The paper describes mathematical formulations for loss functions in equations (2), (3), (4), and (5) but does not contain a dedicated section, figure, or block explicitly labeled as 'Pseudocode' or 'Algorithm' with structured steps.
Open Source Code	Yes	(Our choices for the hyperparameters are available in our online code repository at https://github.com/zohairshafi/Explaining Node Embeddings)
Open Datasets	Yes	Table 3: Real-world networks used in our experiments. ... EU Email Leskovec et al. (2007)... US Airport Zhu et al. (2021)... Squirrel Rozemberczki et al. (2021)... Citeseer Bollacker et al. (1998)... FB15K-237 Toutanova & Chen (2015)... Pub Med Roberts (2001)
Dataset Splits	Yes	We use link prediction as our downstream task and sample an equal number of positive and negative edges from the graph. We use 60% of the edges as training data and the remaining as test data. The embeddings are passed through a simple 2 layer fully connected neural network. We repeat the entire process 3 times to create a three-fold cross-validation setup.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory configurations) used for running the experiments. It only mentions 'runtimes per epoch' without specifying the underlying hardware.
Software Dependencies	No	The paper does not explicitly state the specific versions of software dependencies, programming languages, or libraries (e.g., Python, PyTorch, TensorFlow, scikit-learn, CUDA) used to implement the methodology or run the experiments.
Experiment Setup	Yes	We use 128-dimensional embeddings across all algorithms and networks for consistency. We embed every dataset into 128 dimensions for consistency across datasets and algorithms and pass in sense features as node attributes when running DGI and GMI. For this study, we embed the network into 32 dimensions using DGI+XM for brevity. We train a 2-layer GCN with 32 hidden units each for node classification and apply the explainers post-hoc to generate node feature importance scores. (Our choices for the hyperparameters are available in our online code repository at https://github.com/zohairshafi/Explaining Node Embeddings)