SPADE: Semi-supervised Anomaly Detection under Distribution Mismatch
Authors: Jinsung Yoon, Kihyuk Sohn, Chun-Liang Li, Sercan O Arik, Tomas Pfister
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to highlight the benefits of the proposed method, SPADE, in various practical settings of semi-supervised learning with distribution mismatch. We consider multiple anomaly detection datasets for image and tabular data types. As image data, we use MVTec anomaly detection (Bergmann et al., 2019) and Magnetic tile datasets (Huang et al., 2020). As tabular data, we use Covertype, Thyroid, and Drug datasets (see Appendix for detailed data description). In Sec. 5.4, we further utilize two real-world fraud detection datasets (Kaggle credit and Xente) to evaluate the performance of SPADE. We run 5 independent experiments and report average values (standard deviations can be found in Appendix C). We use AUC as the evaluation metric. Ablation studies. |
| Researcher Affiliation | Industry | EMAIL Google Cloud AI |
| Pseudocode | Yes | Algorithm 1 Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling (SPADE). |
| Open Source Code | No | The paper provides GitHub links for baselines used (e.g., VIME, Fix Match, DANN, pulearn, pytorch-cutpaste) in footnotes B.6, but does not provide a specific link or explicit statement for the open-sourcing of the SPADE methodology itself. Therefore, concrete access to source code for the described methodology is not provided. |
| Open Datasets | Yes | We use MVTec anomaly detection (Bergmann et al., 2019) and Magnetic tile datasets (Huang et al., 2020). As tabular data, we use Covertype, Thyroid, and Drug datasets (see Appendix for detailed data description). In Sec. 5.4, we further utilize two real-world fraud detection datasets (Kaggle credit and Xente). Kaggle credit card fraud1 (footnote: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud) Xente fraud detection2 (footnote: https://zindi.africa/competitions/xente-fraud-detection-challenge/data) Appendix B.1: Thyroid data (footnote: https://archive.ics.uci.edu/ml/datasets/thyroid+disease), Drug data (footnote: https://archive.ics.uci.edu/ml/datasets/Drug+consumption+%28quantified%29), Covertype data (footnote: https://archive.ics.uci.edu/ml/datasets/covertype), MVTec data (footnote: https://www.mvtec.com/company/research/datasets/mvtec-ad), Magnetic Tile dataset (footnote: https://github.com/abin24/Magnetic-tile-defect-datasets.) |
| Dataset Splits | Yes | In all experiments, unless the dataset comes with its own train and test split, we randomly divide the dataset into disjoint train and test data. Then, we further divide the training data into disjoint labeled and unlabeled data. Note that we only provide 5% of the data as labeled data for tabular datasets and 20% for image datasets, for the scenario of new types of anomalies. For Thyroid data... We use the pre-defined training and testing dataset division. For Drug data... We divide the entire dataset into training (50%) and testing (50%). For Covertype data... We divide the entire dataset into training (50%) and testing (50%). For MVTec data... we first mix given training and testing data and divide them into training (80%) and testing (20%). For Magnetic Tile dataset... We mix given training and testing data and divide them into training (80%) and testing (20%). In our experiments [for fraud detection], we split the train and test data based on the measurement time. Latest samples are included in the testing data (50%) and early acquired data is included in the training data (50%). We further divide the training data as labeled and unlabeled data. Early acquired data are included in the labeled data (5%-20%), while later acquired data are included in the unlabeled data (80%-95%). |
| Hardware Specification | Yes | All the experiments are done on a single V100 GPU. |
| Software Dependencies | No | The paper mentions various software components used (e.g., scikit-learn, VIME, Fix Match, DANN, Weighted Elkanoto, Bagging PU, Cut Paste, Gaussian Distribution Estimator) but does not provide specific version numbers for these libraries or frameworks as used in the authors' implementation of SPADE. Although some baselines link to GitHub repositories, the exact versions used for the experiments are not specified. |
| Experiment Setup | Yes | We set both α and β as 1.0 for the experiments. Training loss is used for the convergence criteria if the training loss is converged (if no improvement is observed in the loss for 5 epochs), we treat that the models are converged as well. For image data, we use Res Net-18 as the base network architecture. For representation learning, we incorporate Cut Paste (Li et al., 2021) for MVTec and Magnetic Tile datasets. We follow all the training details in (Li et al., 2021) (including all the hyper-parameters). For tabular data, we use two-layer perceptron as the base network architecture where the hidden dimensions is the half of the original feature dimensions. Pseudo-labelers consist of 5 Gaussian Distribution Estimator (GDE) based OCCs. |