Full-Rank Unsupervised Node Embeddings for Directed Graphs via Message Aggregation
Authors: Ciwan Ceylan, Kambiz Ghoorchian, Danica Kragic
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically demonstrate the superior graph alignment accuracy of the ACC model compared to PCAPass, highlighting the negative impact of rank deficiency. Additionally, we compare ACC to state-of-the-art self-supervised graph neural networks (SSGNNs) across five standard node classification benchmarks for directed graphs (Rossi et al., 2023). |
| Researcher Affiliation | Collaboration | Ciwan Ceylan EMAIL Division of Robotics, Perception and Learning KTH Royal Institute of Technology Kambiz Ghoorchian EMAIL SEB Group Danica Kragic EMAIL Division of Robotics, Perception and Learning KTH Royal Institute of Technology |
| Pseudocode | Yes | Algorithm 1: The ACC algorithm for a directed graph G with node features X using K message-passing iterations, desired dimensionality pmax, minimal message size cmin and relative tolerance θ [0, 1]. Algorithm 2: PCA compression computed via SVD as used for ACC. Here X Rn p is a real-valued matrix, c an integer s.t. c p, and θ [0, 1] a relative tolerance. |
| Open Source Code | Yes | ACC1 (Aggregate, Compress, Concatenate), a linear message-passing model designed to prevent rank deficiency while maintaining scalability by generating embeddings in a single forward pass. As shown in Figure 1, ACC applies aggregation and PCA compression to the message matrices from the previous iteration, rather than the embeddings, and constructs the embeddings via concatenation separately. This message aggregation approach breaks the feedback loop present in PCAPass, and thereby avoids computing the redundant features that cause rank deficiency. 1ACC model code: https://github.com/ciwanceylan/acc-mp. Experiments code: https://github.com/ciwanceylan/acc-experiments-tmlr2024. |
| Open Datasets | Yes | We conduct experiments using four datasets: Arenas, PPI, Enron, and Polblogs. Arenas and PPI are widely used undirected benchmark graphs (Heimann et al., 2018; Jin et al., 2021; Skitsas et al., 2023), while Enron and Polblogs provide examples of directed graphs. Additionally, we evaluate performance on the real-world Magna dataset (Saraph & Milenković, 2014), where the perturbed graph G2 contains 15% more edges than G1. To compare ACC with self-supervised graph neural networks (SSGNNs), we evaluate node classification accuracy on five standard directed graph datasets from the literature (Pei et al., 2020; Lim et al., 2021; Platonov et al., 2023; Rossi et al., 2023). |
| Dataset Splits | Yes | To ensure robustness, we perform three repeats of 5-fold cross-validation with five different random seeds, reporting mean and standard deviation statistics for the classification accuracy. By training a gradient boosting classifier on each feature group with an 80-20 training-test split, we find the following test accuracies: 51% for X, 33% for AFX, and 84% for ABX. |
| Hardware Specification | Yes | All models are executed in a Google Cloud g2-standard-32 environment with one Nvidia L4 24GB GPU, 32 v CPUs @ 2.20GHz, and 128 GB of memory. |
| Software Dependencies | No | The paper mentions using "Scikit-learn's implementation of Light GBM (Ke et al., 2017)" but does not provide specific version numbers for either Scikit-learn or Light GBM. It also lists several models and their implementations from PyTorch Geometric without specifying version numbers for PyTorch Geometric or PyTorch itself. |
| Experiment Setup | Yes | For both ACC and PCAPass, we set the maximum embedding dimension to pmax = 512 and use node structural features as input, X Rn d. For all models, including ACC, we use K = 2 message-passing iterations and p = 512 embedding dimensions, both of which are commonly used values in the literature (Hamilton et al., 2017; Veličković et al., 2018; Zhang et al., 2021; Thakoor et al., 2022; Hou et al., 2022). We use default values for optimizer and loss function hyperparameters. |