Topological Node2vec: Enhanced Graph Embedding via Persistent Homology

Authors: Yasuaki Hiraoka, Yusuke Imoto, Théo Lacombe, Killian Meehan, Toshiaki Yachimura

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments suggest Node2vec struggles to recreate the topology of the input graph. To resolve this we introduce a topological loss term to be added to the training loss of Node2vec which tries to align the persistence diagram (PD) of the resulting embedding as closely as possible to that of the input graph. Following results in computational optimal transport, we carefully adapt entropic regularization to PD metrics, allowing us to measure the discrepancy between PDs in a differentiable way. Our modified loss function can then be minimized through gradient descent to reconstruct both the geometry and the topology of the input graph. We showcase the benefits of this approach using demonstrative synthetic examples.
Researcher Affiliation Academia Yasuaki Hiraoka EMAIL Institute for the Advanced Study of Human Biology Kyoto University Institute for Advanced Study Kyoto University, Kyoto 606-8501, Japan Yusuke Imoto EMAIL Institute for the Advanced Study of Human Biology Kyoto University Institute for Advanced Study Kyoto University, Kyoto 606-8501, Japan Th eo Lacombe EMAIL Laboratoire d Informatique Gaspard Monge, Univ. Gustave Eiffel, CNRS, LIGM, F-77454 Marne-la-Vall ee, France Killian Meehan EMAIL Institute for the Advanced Study of Human Biology Kyoto University Institute for Advanced Study Kyoto University, Kyoto 606-8501, Japan Toshiaki Yachimura EMAIL Mathematical Science Center for Co-creative Society, Tohoku University, Sendai 980-0845, Japan
Pseudocode Yes Algorithm 1 Node2vec algorithm Algorithm 2 Topological Node2vec algorithm
Open Source Code Yes Our implementation is publicly available at this repository1. Our code is publicly available at this repository3.
Open Datasets No We look at a dataset consisting of 8 circles each with 16 points arranged radially in a larger circle, seen in Figure 5. While the points and the sampled circles themselves are distributed uniformly, each point is wiggled by some small random noise, ensuring that the input data satisfies Vietoris-Rips general position (which is guaranteed if all pairwise distances are unique). We sample a torus by arranging points uniformly over the surface. (These are not uniformly sampled from a distribution, but rather a constructed covering. See the demonstration notebook in the repository for the precise details.)
Dataset Splits No The paper uses synthetic datasets which are generated rather than relying on pre-existing, fixed datasets with explicit train/test/validation splits. While minibatches are mentioned for training, these refer to subsampling of the dynamically generated data points during training, not pre-defined, reproducible dataset partitions typically found in traditional machine learning tasks.
Hardware Specification No Our implementation is publicly available at this repository3. Its implementation provides both CPU and GPU backend.
Software Dependencies No The CPU-backend relies on Gudhi (Maria et al., 2014), while the GPU-backend is based on a fork of Ripser++ (Bauer, 2021; Zhang et al., 2020b) where we adapted the code in order to access the correspondence between a generator x in the PD and the points in the point cloud responsible for the birth edge and death edge. For an in-depth examination of all the hyperparameters of this network (both those related to original Node2vec as well as our topological additions), please see the readme file and examples notebook in the repository.
Experiment Setup Yes Input: a weighted graph G = (V, E, w), learning rate η > 0, embedding dimension m, neighborhood parameters (l, r, p, q); ... Input: a weighted graph G = (V, E, w), learning rates η, λ0, λ1, embedding dimension m, neighborhood parameters (l, r, p, q), entropic regularization parameter ϵ > 0; ... (a d) We see that l = 1 and r large give the best results. (e) This embedding was done using full-column vectors for neighborhood information. We note the dramatic loss of topology in Node2vec and equally dramatic recovery of such features with the inclusion of the topological loss function (14) demonstrated in Figure 8. The topological loss term for this example uses homology dimensions 1 (loops) and 2 (voids). Remark 16 In practice, we set λ1 = 0 for the first few epochs (no topological loss) in order to obtain an embedding with mostly accurate geometry, after which we allow λ1 > 0 to begin correcting the topology.