reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Full-Rank Unsupervised Node Embeddings for Directed Graphs via Message Aggregation

Authors: Ciwan Ceylan, Kambiz Ghoorchian, Danica Kragic

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we empirically demonstrate the superior graph alignment accuracy of the ACC model compared to PCAPass, highlighting the negative impact of rank deficiency. Additionally, we compare ACC to state-of-the-art self-supervised graph neural networks (SSGNNs) across five standard node classification benchmarks for directed graphs (Rossi et al., 2023).
Researcher Affiliation	Collaboration	Ciwan Ceylan EMAIL Division of Robotics, Perception and Learning KTH Royal Institute of Technology Kambiz Ghoorchian EMAIL SEB Group Danica Kragic EMAIL Division of Robotics, Perception and Learning KTH Royal Institute of Technology
Pseudocode	Yes	Algorithm 1: The ACC algorithm for a directed graph G with node features X using K message-passing iterations, desired dimensionality pmax, minimal message size cmin and relative tolerance θ [0, 1]. Algorithm 2: PCA compression computed via SVD as used for ACC. Here X Rn p is a real-valued matrix, c an integer s.t. c p, and θ [0, 1] a relative tolerance.
Open Source Code	Yes	ACC1 (Aggregate, Compress, Concatenate), a linear message-passing model designed to prevent rank deficiency while maintaining scalability by generating embeddings in a single forward pass. As shown in Figure 1, ACC applies aggregation and PCA compression to the message matrices from the previous iteration, rather than the embeddings, and constructs the embeddings via concatenation separately. This message aggregation approach breaks the feedback loop present in PCAPass, and thereby avoids computing the redundant features that cause rank deficiency. 1ACC model code: https://github.com/ciwanceylan/acc-mp. Experiments code: https://github.com/ciwanceylan/acc-experiments-tmlr2024.
Open Datasets	Yes	We conduct experiments using four datasets: Arenas, PPI, Enron, and Polblogs. Arenas and PPI are widely used undirected benchmark graphs (Heimann et al., 2018; Jin et al., 2021; Skitsas et al., 2023), while Enron and Polblogs provide examples of directed graphs. Additionally, we evaluate performance on the real-world Magna dataset (Saraph & Milenković, 2014), where the perturbed graph G2 contains 15% more edges than G1. To compare ACC with self-supervised graph neural networks (SSGNNs), we evaluate node classification accuracy on five standard directed graph datasets from the literature (Pei et al., 2020; Lim et al., 2021; Platonov et al., 2023; Rossi et al., 2023).
Dataset Splits	Yes	To ensure robustness, we perform three repeats of 5-fold cross-validation with five different random seeds, reporting mean and standard deviation statistics for the classification accuracy. By training a gradient boosting classifier on each feature group with an 80-20 training-test split, we find the following test accuracies: 51% for X, 33% for AFX, and 84% for ABX.
Hardware Specification	Yes	All models are executed in a Google Cloud g2-standard-32 environment with one Nvidia L4 24GB GPU, 32 v CPUs @ 2.20GHz, and 128 GB of memory.
Software Dependencies	No	The paper mentions using "Scikit-learn's implementation of Light GBM (Ke et al., 2017)" but does not provide specific version numbers for either Scikit-learn or Light GBM. It also lists several models and their implementations from PyTorch Geometric without specifying version numbers for PyTorch Geometric or PyTorch itself.
Experiment Setup	Yes	For both ACC and PCAPass, we set the maximum embedding dimension to pmax = 512 and use node structural features as input, X Rn d. For all models, including ACC, we use K = 2 message-passing iterations and p = 512 embedding dimensions, both of which are commonly used values in the literature (Hamilton et al., 2017; Veličković et al., 2018; Zhang et al., 2021; Thakoor et al., 2022; Hou et al., 2022). We use default values for optimizer and loss function hyperparameters.