reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

conDENSE: Conditional Density Estimation for Time Series Anomaly Detection

Authors: Alex Moore, Davide Morelli

JAIR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on 31 time-series, including real-world anomaly detection benchmark datasets and synthetically generated data, show that the model can outperform state-of-the-art deep learning methods.
Researcher Affiliation	Collaboration	Alex Moore EMAIL Huma Therapeutics Ltd, Millbank Tower, 21-24 Millbank, London SW1P 4QP, United Kingdom Corresponding Author Davide Morelli EMAIL Huma Therapeutics Ltd & Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford OX3 7DQ, United Kingdom
Pseudocode	No	The paper describes the model architecture and components (Figure 1), and provides mathematical equations (Equation 3 for cost function), but does not contain a dedicated pseudocode or algorithm block.
Open Source Code	No	The paper states: "We used publicly available code, provided by the original developers, for Tran AD and GDN1." and provides links to these third-party implementations. However, it does not provide concrete access to the source code for the con DENSE methodology described in this paper.
Open Datasets	Yes	Our model was evaluated using a combination of real-world and synthetic data. Assessment on commonly used anomaly detection benchmark datasets allows us to see how well our model generalises when applied to real-world time-series, with unknown data generating processes. It also facilitates direct comparisons with similar work. On the other hand, assessment on synthetic datasets with known properties allows us to make more concrete claims about the types of anomalies our model can handle. In total we used 11 real world time-series and 20 synthetic time-series. The paper specifically mentions and cites: UCR (Dau, et al., 2019), Server Machine Dataset (SMD) (Su et al., 2019), Secure Water Treatment (SWaT) (Goh, et al., 2016), Mars Science Laboratory (MSL) (Nakamura, et al., 2020; Tuli, et al., 2022). Synthetic data were generated using Guten Tag (Wenig, et al., 2022).
Dataset Splits	Yes	Table 1: Real-world dataset statistics (includes Train Size and Test Size for UCR, SMD, SWaT, MSL). "In order to do this, 20% of training data was withheld for use as a validation set."
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or other detailed computer specifications used for running the experiments. It only mentions training times for various models.
Software Dependencies	No	The paper mentions that "Our models were implemented using Tensor Flow" and "All other benchmark models were implemented using Time Eval, an open source evaluation tool for anomaly detection algorithms (Wenig et al., 2022)." However, specific version numbers for TensorFlow or Time Eval are not provided, nor are other key libraries with their versions.
Experiment Setup	Yes	Model specific hyperparameters are shown in Table 3. con DENSE hyperparameters were optimized using the MSL P1 time-series, with respect to ROC AUC, performance on this dataset is not included in our results. With the exception of POT lm, all con DENSE hyperparameters are shared across all datasets. We use a higher POT lm coefficient for the univariate time-series dataset, in keeping with previous work (Tuli et al., 2022; Su et al., 2019). Table 3 provides: MAF Bijectors 5, MAF Hidden Units [64, 64], Window Size 10, GRU Hidden Units 5, Latent Dimensions 20, VAE Hidden Units [256, 128], POT lm 0.993 or 0.97, βr (see Equation 3) 0.1. Our models were implemented using Tensor Flow and optimised using the Adam algorithm with a learning rate of 0.001, and a batch size of 128 (these were the suggested default values). Models were trained for up to 50 epochs, although an early stopping rule terminated training early if performance had not improved over three consecutive epochs.