reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

PyOD: A Python Toolbox for Scalable Outlier Detection

Authors: Yue Zhao, Zain Nasrullah, Zheng Li

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	PyOD is an open-source Python toolbox for performing scalable outlier detection on multivariate data. Uniquely, it provides access to a wide range of outlier detection algorithms, including established outlier ensembles and more recent neural network-based approaches, under a single, well-documented API designed for use by both practitioners and researchers. ... In addition to the outlier detection algorithms, a set of helper and utility functions (generate data, evaluate print and visualize) are included in the library for quick model exploration and evaluation. The two-dimensional artiﬁcial data used in the example is created by generate data which generates inliers from a Gaussian distribution and outliers from a uniform distribution. An example of using visualize is shown in Figure 1.
Researcher Affiliation	Academia	Yue Zhao EMAIL Carnegie Mellon University Pittsburgh, PA 15213, USA. Zain Nasrullah EMAIL University of Toronto Toronto, ON M5S 2E4, Canada. Zheng Li jk EMAIL Northeastern University Toronto Toronto, ON M5X 1E2, Canada.
Pseudocode	No	The paper includes 'Code Snippet 1', which presents actual Python code for demonstrating the PyOD API, rather than a structured pseudocode or algorithm block.
Open Source Code	Yes	PyOD is an open-source Python toolbox for performing scalable outlier detection on multivariate data. ... PyOD is compatible with both Python 2 and 3 and can be installed through Python Package Index (PyPI) or https://github.com/yzhao062/pyod.
Open Datasets	No	The paper uses `generate_data` for a demonstration: 'The two-dimensional artiﬁcial data used in the example is created by generate data which generates inliers from a Gaussian distribution and outliers from a uniform distribution.' This is synthetically generated data and not a publicly available dataset with concrete access information. No other datasets with specific access details are mentioned for experimental evaluation.
Dataset Splits	Yes	Code Snippet 1: >>> X_train, y_train, X_test, y_test = generate_data( ... n_train=200, n_test=100, n_features=2)
Hardware Specification	No	The paper mentions 'optimization instruments are employed when possible: just-in-time (JIT) compilation and parallelization are enabled in select models for scalable outlier detection.' and 'Parallelization for multi-core execution is also available for a set of algorithms using joblib.' However, it does not provide specific details on the hardware (e.g., CPU or GPU models) used for any experiments or performance evaluations.
Software Dependencies	No	PyOD is compatible with both Python 2 and 3 using six; it relies on numpy, scipy and scikit-learn as well. Neural networks such as autoencoders and SO GAAL additionally require Keras. To enhance model scalability, select algorithms (Table 1) are optimized with JIT using numba. Parallelization for multi-core execution is also available for a set of algorithms using joblib. The paper lists software dependencies but does not provide specific version numbers for any of them.
Experiment Setup	Yes	Code Snippet 1: >>> clf = ABOD(method="fast") # initialize detector