Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Unsupervised Anomaly Detection with Rejection

Authors: Lorenzo Perini, Jesse Davis

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally address the following research questions: Q1. How does REJEX s cost compare to the baselines? Q2. How does varying the cost function affect the results? Q3. How does REJEX s CPU time compare to the baselines? Q4. Do the theoretical results hold in practice?
Researcher Affiliation Academia Lorenzo Perini DTAI lab & Leuven.AI, KU Leuven, Belgium EMAIL Jesse Davis, DTAI lab & Leuven.AI, KU Leuven, Belgium EMAIL
Pseudocode No The paper includes mathematical formulations and proofs (e.g., Theorem 3.1, Theorem 3.5, Theorem 3.6, Theorem 3.8), but it does not feature any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code available at: https://github.com/Lorenzo-Perini/RejEx.
Open Datasets Yes We carry out our study on 34 publicly available benchmark datasets, widely used in the literature [23]. These datasets can be downloaded in the following link: https://github.com/Minqi824/ADBench/tree/main/datasets/Classical.
Dataset Splits Yes (1) we split the dataset into training and test sets (80-20) using 5 fold cross-validation; ... This uses (any) unsupervised detector to obtain pseudo labels for the training set. It then sets the rejection threshold as follows: 1) it creates a held-out validation set (20%)
Hardware Specification Yes All experiments were run on an Intel(R) Xeon(R) Silver 4214 CPU.
Software Dependencies No The paper mentions software tools like PYOD [66] and Bayesian Optimization [17] but does not specify their version numbers or the version of the programming language (e.g., Python) used.
Experiment Setup Yes We set our tolerance ε = 2e T with T = 32. Note that the exponential smooths out the effect of T 4, which makes setting a different T have little impact. We use a set of 12 unsupervised anomaly detectors implemented in PYOD [66] with default hyperparameters [62] because the unsupervised setting does not allow us to tune them: KNN [3], IFOREST [42], LOF [5], OCSVM [58], AE [8], HBOS [21], LODA [53], COPOD [39], GMM [2], ECOD [40], KDE [36], INNE [4]. We set all the baselines rejection threshold via Bayesian Optimization with 50 calls [17].