reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Topological Node2vec: Enhanced Graph Embedding via Persistent Homology

Authors: Yasuaki Hiraoka, Yusuke Imoto, Théo Lacombe, Killian Meehan, Toshiaki Yachimura

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments suggest Node2vec struggles to recreate the topology of the input graph. To resolve this we introduce a topological loss term to be added to the training loss of Node2vec which tries to align the persistence diagram (PD) of the resulting embedding as closely as possible to that of the input graph. Following results in computational optimal transport, we carefully adapt entropic regularization to PD metrics, allowing us to measure the discrepancy between PDs in a diﬀerentiable way. Our modiﬁed loss function can then be minimized through gradient descent to reconstruct both the geometry and the topology of the input graph. We showcase the beneﬁts of this approach using demonstrative synthetic examples.
Researcher Affiliation	Academia	Yasuaki Hiraoka EMAIL Institute for the Advanced Study of Human Biology Kyoto University Institute for Advanced Study Kyoto University, Kyoto 606-8501, Japan Yusuke Imoto EMAIL Institute for the Advanced Study of Human Biology Kyoto University Institute for Advanced Study Kyoto University, Kyoto 606-8501, Japan Th eo Lacombe EMAIL Laboratoire d Informatique Gaspard Monge, Univ. Gustave Eiﬀel, CNRS, LIGM, F-77454 Marne-la-Vall ee, France Killian Meehan EMAIL Institute for the Advanced Study of Human Biology Kyoto University Institute for Advanced Study Kyoto University, Kyoto 606-8501, Japan Toshiaki Yachimura EMAIL Mathematical Science Center for Co-creative Society, Tohoku University, Sendai 980-0845, Japan
Pseudocode	Yes	Algorithm 1 Node2vec algorithm Algorithm 2 Topological Node2vec algorithm
Open Source Code	Yes	Our implementation is publicly available at this repository1. Our code is publicly available at this repository3.
Open Datasets	No	We look at a dataset consisting of 8 circles each with 16 points arranged radially in a larger circle, seen in Figure 5. While the points and the sampled circles themselves are distributed uniformly, each point is wiggled by some small random noise, ensuring that the input data satisﬁes Vietoris-Rips general position (which is guaranteed if all pairwise distances are unique). We sample a torus by arranging points uniformly over the surface. (These are not uniformly sampled from a distribution, but rather a constructed covering. See the demonstration notebook in the repository for the precise details.)
Dataset Splits	No	The paper uses synthetic datasets which are generated rather than relying on pre-existing, fixed datasets with explicit train/test/validation splits. While minibatches are mentioned for training, these refer to subsampling of the dynamically generated data points during training, not pre-defined, reproducible dataset partitions typically found in traditional machine learning tasks.
Hardware Specification	No	Our implementation is publicly available at this repository3. Its implementation provides both CPU and GPU backend.
Software Dependencies	No	The CPU-backend relies on Gudhi (Maria et al., 2014), while the GPU-backend is based on a fork of Ripser++ (Bauer, 2021; Zhang et al., 2020b) where we adapted the code in order to access the correspondence between a generator x in the PD and the points in the point cloud responsible for the birth edge and death edge. For an in-depth examination of all the hyperparameters of this network (both those related to original Node2vec as well as our topological additions), please see the readme ﬁle and examples notebook in the repository.
Experiment Setup	Yes	Input: a weighted graph G = (V, E, w), learning rate η > 0, embedding dimension m, neighborhood parameters (l, r, p, q); ... Input: a weighted graph G = (V, E, w), learning rates η, λ0, λ1, embedding dimension m, neighborhood parameters (l, r, p, q), entropic regularization parameter ϵ > 0; ... (a d) We see that l = 1 and r large give the best results. (e) This embedding was done using full-column vectors for neighborhood information. We note the dramatic loss of topology in Node2vec and equally dramatic recovery of such features with the inclusion of the topological loss function (14) demonstrated in Figure 8. The topological loss term for this example uses homology dimensions 1 (loops) and 2 (voids). Remark 16 In practice, we set λ1 = 0 for the ﬁrst few epochs (no topological loss) in order to obtain an embedding with mostly accurate geometry, after which we allow λ1 > 0 to begin correcting the topology.