SEAD: Unsupervised Ensemble of Streaming Anomaly Detectors

Authors: Saumya Gaurang Shah, Abishek Sankararaman, Balakrishnan Murali Narayanaswamy, Vikramank Singh

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on 14 non-trivial public datasets and an internal dataset corroborate our claims.
Researcher Affiliation Industry 1Amazon Web Services, Santa Clara, CA, USA. Correspondence to: Saumya Gaurang Shah <EMAIL>.
Pseudocode Yes The complete pseudo code is in Algorithm 1. Algorithm 1 SEAD Algorithm Algorithm 2 SEAD ++: Optimizing runtime by sampling
Open Source Code No The paper mentions using "open source implementations from Py SAD (Yilmaz & Kozat, 2020)" for base models but does not state that the code for SEAD itself is open-source, nor does it provide a link.
Open Datasets Yes We perform experiments on 15 datasets, of which 11 are from the Outlier Detection Data Sets (ODDS) (Rayana, 2016), 3 are from the USP Data Stream Repository (Souza et al., 2020) and one is an internal telemetry dataset from a multiserver database cloud service.
Dataset Splits Yes We set the first 100 data points for warm starting the base models and SEAD , but not for evaluation, i.e., is cold start . To overcome this issue, we split each dataset into chunks of 50 contiguous data points.
Hardware Specification Yes We performed all experiments on a single c5.2xlarge AWS EC2 instance.
Software Dependencies No The paper mentions using "open source implementations from Py SAD (Yilmaz & Kozat, 2020)" and "tdigest (Dunning & Ertl, 2019)" but does not provide specific version numbers for these or other software libraries/dependencies.
Experiment Setup Yes For our method SEAD , we choose hyperparameters η = 1, λ = 10 6 and π = Uniform distribution across all experiments. Table 11. Hyperparameter configurations for the base models. We set the first 100 data points for warm starting the base models and SEAD