reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning-Based Link Anomaly Detection in Continuous-Time Dynamic Graphs

Authors: Tim Postuvan, Claas Grohnfeldt, Michele Russo, Giulio Lovisotto

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive benchmarks on synthetic and real-world datasets featuring synthetic and labeled organic anomalies and employing six state-of-the-art link prediction methods validate our taxonomy and generation processes for anomalies and benign graphs, as well as our approach to adapting methods for anomaly detection. Our results reveal that different learning methods excel in capturing different aspects of graph normality and detecting different types of anomalies. We conclude with a comprehensive list of findings highlighting opportunities for future research.
Researcher Affiliation	Collaboration	Tim Poštuvan EMAIL EPFL Claas Grohnfeldt EMAIL Huawei Technologies Michele Russo EMAIL Huawei Technologies Giulio Lovisotto EMAIL Huawei Technologies
Pseudocode	Yes	Algorithm 1 outlines the complete process. To enforce consistencies in the generated CTDG, we do the following: ... Algorithm 1: Synthetic Dynamic Graph Generation
Open Source Code	Yes	The code is available at https://github.com/timpostuvan/CTDG-link-anomaly-detection.
Open Datasets	Yes	We run experiments on eight different datasets. The first one is Temp Synth Graph a CTDG we generated using Alg. 1 with 10,000 nodes (see 3.3). We use five benign real-world datasets: Wikipedia, Reddit, MOOC, Enron, and UCI from the work of Poursafaei et al. (2022). Finally, we experiment on two CTDGs from the cybersecurity domain with labeled organic anomalies: LANL (Kent, 2016) and Darpa-Theia (Reha et al., 2023).
Dataset Splits	Yes	In datasets with synthetic anomalies, we chronologically divide each dataset into training (70%), validation (15%), and testing (15%). For the two datasets with organic anomalies LANL and Darpa-Theia, we use all edges that occurred before the first anomalous edge for training, while the remaining data is split so that validation and testing sets contain an equal number of anomalous edges.
Hardware Specification	Yes	We execute experiments on a cluster with 1000 Intel(R) Xeon(R) CPUs @ 2.60GHz. Note that, since experiments for Darpa-Theia would take months under our model configuration and hardware setup, we run Edge Bank and Edge Banktw with a batch size of 100 (rather than 1, see 5.1), and models on this dataset are stopped after 25 days of execution time. In practice only TGN does not converge in this timespan. The results for the best checkpoint by the 25th day are reported. The rest of the model configurations are as follows: ... Measurements are performed on an Ubuntu machine equipped with one Intel(R) Xeon(R) CPU E5-2658A v3 @ 2.20GHz with 24 physical cores.
Software Dependencies	No	The paper mentions various models and algorithms like Adam optimizer, Gated Recurrent Units, and Transformer architectures, but does not specify software library versions (e.g., PyTorch 1.x, Python 3.x, CUDA x.x) which are necessary for reproducible software dependencies.
Experiment Setup	Yes	We train models for the task of link anomaly detection of Def. 2. Following our considerations from 4 we use binary cross-entropy loss as in Eq. 4 and the negative sampler in Eq. 5. For each method, the same MLP prediction head takes the edge (including time and edge attributes) as input and predicts the probability p(e) of the link s existence. The link anomaly score is then computed as 1-p(e). We use the Area Under the Receiver Operating Characteristic curve (AUC) as the main evaluation metric, but we also report Average Precision (AP) and Recall@k metrics in App. C. In datasets with synthetic anomalies, we chronologically divide each dataset into training (70%), validation (15%), and testing (15%). For the two datasets with organic anomalies LANL and Darpa-Theia, we use all edges that occurred before the first anomalous edge for training, while the remaining data is split so that validation and testing sets contain an equal number of anomalous edges. For testing, we select models based on their AUC on the validation set. Synthetic anomaly injection. In Temp Synth Graph, Wikipedia, Reddit, UCI, Enron and MOOC, we inject 5% of synthetic anomalies in the validation and test splits with the procedure described in 3.2. Reference benign edges are sampled within the same data split and with probabilities proportional to o 0.5 e , with oe representing the number of occurrences of edge e in that split. For c, t-c, s-c, and t-s-c anomalies, following Huang et al. (2023b), we adopt cosine similarity as the comparison metric (d in 3.2) with K=10 randomly sampled edge messages; For s-c anomalies, W =20. Implementation details. We optimize parametric models with Adam (Kingma & Ba, 2015), for 50 epochs, with an early stopping patience of 20. We perform a grid search for learning rate in the range [10 3, 10 6]. Batch size is always 200 except in Edge Bank and Edge Banktw, where we set it to 1 to prevent inaccurate look-ups within the same batch. Models have 100k trainable parameters, with hyper-parameters adapted from Yu et al. (2023). We repeat each experimental run with three weight initializations, with the exception of Darpa-Theia where we only use one initialization due to compute limitation. When introducing synthetic anomalies, we also average results across three randomly seeded sets of injected anomalies.