reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Deep-Graph-Sprints: Accelerated Representation Learning in Continuous-Time Dynamic Graphs

Authors: Ahmad Naser Eddin, Jacopo Bono, David Oliveira Aparicio, Hugo Ferreira, Pedro Manuel Pinto Ribeiro, Pedro Bizarro

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We benchmark DGS against state-of-the-art (SOTA) feature engineering and graph neural network methods using five diverse datasets. The results indicate that DGS achieves competitive performance while inference speed improves between 4x and 12x compared to other deep learning approaches on our benchmark datasets.
Researcher Affiliation	Collaboration	Ahmad Naser Eddin EMAIL Feedzai, Portugal Departamento de Ciência de Computadores, Faculdade de Ciências, Universidade do Porto, Portugal Jacopo Bono EMAIL Feedzai, Portugal David Aparício EMAIL Departamento de Ciência de Computadores, Faculdade de Ciências, Universidade do Porto, Portugal Hugo Ferreira EMAIL Feedzai, Portugal Pedro Ribeiro EMAIL Departamento de Ciência de Computadores, Faculdade de Ciências, Universidade do Porto, Portugal Pedro Bizarro EMAIL Feedzai, Portugal
Pseudocode	No	The paper includes figures illustrating the architecture and state calculation (Figure 1, Figure 2) and describes methods using equations. However, it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps formatted like code.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	Yes	We leverage five different datasets, all CTDGs and labeled. Each dataset is split into train, validation, and test sets respecting time (i.e., all events in the train are older than the events in validation, and all events in validation are older than the events in the test set). Three of these datasets are public (Kumar et al., 2019) from the social and education domains. In these three datasets, we adopt the identical data partitioning strategy employed by the baseline methods we compare against, which also utilized these datasets. The other two datasets are real-world banking datasets from the AML domain. Due to privacy concerns, we can not disclose the identity of the FIs nor provide exact details regarding the node features. We refer to the datasets as FI-A and FI-B. The graphs in this use case are constructed by considering the accounts as nodes and the money transfers between accounts as edges. Table 5 shows the details of all the used datasets.
Dataset Splits	Yes	Each dataset is split into train, validation, and test sets respecting time (i.e., all events in the train are older than the events in validation, and all events in validation are older than the events in the test set)... Table 5 shows the details of all the used datasets. ... Used split (%) 70-15-15 60-20-20 70-15-15 60-10-30 60-10-30
Hardware Specification	Yes	Tests were performed on a Linux PC equipped with 24 Intel Xeon CPU cores (3.70GHz) and an NVIDIA Ge Force RTX 2080 Ti GPU (11GB). Note that all experiments, including those for link prediction and node classification mentioned in the previous sections, used the same machine.
Software Dependencies	No	The Jacobians updates were implemented manually using Py Torch (Fey & Lenssen, 2019). ...to implement that we also leverage the functionalities of Py Torch. ...The hyperparameter optimization process utilizes Optuna (Akiba et al., 2019)... The paper mentions PyTorch and Optuna but does not provide specific version numbers for these software components.
Experiment Setup	Yes	The hyperparameter optimization process utilizes Optuna (Akiba et al., 2019) for training 100 models. Initial 70 trials are conducted through random sampling, followed by the application of the TPE sampler. Each model incorporated an early stopping mechanism, triggered after 10 epochs without improvement. Table 6 enumerates the hyperparameters and their respective ranges employed in the tuning process of DGS and the baselines.