reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Weighted Embeddings for Low-Dimensional Graph Representation

Authors: Thomas Bläsius, Jean-Pierre von der Heydt, Maximilian Katzmann, Nikolai Maas

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide the embedding algorithm WEMBED and demonstrate, based on generated as well as over 2000 real-world graphs, that our weighted embeddings heavily outperform state-of-the-art Euclidean embeddings for heterogeneous graphs while using fewer dimensions. The running time of WEMBED and embedding quality for the remaining instances is on par with state-of-the-art Euclidean embedders.
Researcher Affiliation	Academia	Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany EMAIL
Pseudocode	No	The paper describes the WEMBED algorithm in paragraph text, detailing its steps and components such as weight estimation, gradient descent optimization, and the use of geometric data structures. However, it does not present this information within a clearly structured pseudocode block or algorithm figure.
Open Source Code	No	The paper does not contain an explicit statement about the release of source code for the described methodology (WEMBED algorithm) or a direct link to a code repository.
Open Datasets	Yes	We use a set of real-world networks described by Bl asius and Fischbeck (2024) restricted to graphs with n 10 k nodes and m 50 k edges, which amounts to 2173 graphs. We refer to this data set as real.
Dataset Splits	No	The paper evaluates embedding quality based on reconstruction accuracy and calculates F1-scores. It mentions sampling '10 m non-edges' for approximating F1-scores on large graphs, but it does not specify explicit training, validation, or test splits for the datasets in the traditional sense for supervised learning tasks or for the embedding process itself.
Hardware Specification	Yes	All experiments were run on a machine with two AMD EPYC Zen2 7742 (64 cores) with 2.25-3.35 GHz frequency, 1024GB RAM and 256MB L3 cache. To account for randomness, each data point is the average of five runs. To get comparable running times, computation is single-threaded. The only exception is the much slower POINCAR EEMBED, for which we compute only one embedding and use all 128 available cores.
Software Dependencies	No	The paper mentions using the 'Adam optimizer' and 'R-tree implementation provided by Boost'. However, it does not specify version numbers for these software components or any other libraries used, which is required for a reproducible description of software dependencies.
Experiment Setup	Yes	We used the default parameters for DEEPWALK, sigmoid and negative sampling (option 6) for FORCE2VEC, and adjacency similarity (verse neigh) for VERSE. The default parameters for POINCAR EEMBED did not yield good results and we used lr = 0.3, epochs = 300, and batchsize = 50. All experiments were run on a machine with two AMD EPYC Zen2 7742 (64 cores) with 2.25-3.35 GHz frequency, 1024GB RAM and 256MB L3 cache. To account for randomness, each data point is the average of five runs. To get comparable running times, computation is single-threaded. The only exception is the much slower POINCAR EEMBED, for which we compute only one embedding and use all 128 available cores.