PyOD: A Python Toolbox for Scalable Outlier Detection

Authors: Yue Zhao, Zain Nasrullah, Zheng Li

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental PyOD is an open-source Python toolbox for performing scalable outlier detection on multivariate data. Uniquely, it provides access to a wide range of outlier detection algorithms, including established outlier ensembles and more recent neural network-based approaches, under a single, well-documented API designed for use by both practitioners and researchers. ... In addition to the outlier detection algorithms, a set of helper and utility functions (generate data, evaluate print and visualize) are included in the library for quick model exploration and evaluation. The two-dimensional artificial data used in the example is created by generate data which generates inliers from a Gaussian distribution and outliers from a uniform distribution. An example of using visualize is shown in Figure 1.
Researcher Affiliation Academia Yue Zhao EMAIL Carnegie Mellon University Pittsburgh, PA 15213, USA. Zain Nasrullah EMAIL University of Toronto Toronto, ON M5S 2E4, Canada. Zheng Li jk EMAIL Northeastern University Toronto Toronto, ON M5X 1E2, Canada.
Pseudocode No The paper includes 'Code Snippet 1', which presents actual Python code for demonstrating the PyOD API, rather than a structured pseudocode or algorithm block.
Open Source Code Yes PyOD is an open-source Python toolbox for performing scalable outlier detection on multivariate data. ... PyOD is compatible with both Python 2 and 3 and can be installed through Python Package Index (PyPI) or https://github.com/yzhao062/pyod.
Open Datasets No The paper uses `generate_data` for a demonstration: 'The two-dimensional artificial data used in the example is created by generate data which generates inliers from a Gaussian distribution and outliers from a uniform distribution.' This is synthetically generated data and not a publicly available dataset with concrete access information. No other datasets with specific access details are mentioned for experimental evaluation.
Dataset Splits Yes Code Snippet 1: >>> X_train, y_train, X_test, y_test = generate_data( ... n_train=200, n_test=100, n_features=2)
Hardware Specification No The paper mentions 'optimization instruments are employed when possible: just-in-time (JIT) compilation and parallelization are enabled in select models for scalable outlier detection.' and 'Parallelization for multi-core execution is also available for a set of algorithms using joblib.' However, it does not provide specific details on the hardware (e.g., CPU or GPU models) used for any experiments or performance evaluations.
Software Dependencies No PyOD is compatible with both Python 2 and 3 using six; it relies on numpy, scipy and scikit-learn as well. Neural networks such as autoencoders and SO GAAL additionally require Keras. To enhance model scalability, select algorithms (Table 1) are optimized with JIT using numba. Parallelization for multi-core execution is also available for a set of algorithms using joblib. The paper lists software dependencies but does not provide specific version numbers for any of them.
Experiment Setup Yes Code Snippet 1: >>> clf = ABOD(method="fast") # initialize detector