reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Automatic Unsupervised Outlier Model Selection

Authors: Yue Zhao, Ryan Rossi, Leman Akoglu

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that selecting a model by METAOD signiﬁcantly outperforms no model selection (e.g. always using the same popular model or the ensemble of many) as well as other meta-learning techniques that we tailored for UOMS.
Researcher Affiliation	Collaboration	Yue Zhao Carnegie Mellon University EMAIL Ryan A. Rossi Adobe Research EMAIL Leman Akoglu Carnegie Mellon University EMAIL
Pseudocode	Yes	We also provide the detailed steps of METAOD in pseudo-code, for both meta-training (ofﬂine) and model selection (online), in Appendix D Algo. 1.
Open Source Code	Yes	We open-source1 METAOD and our meta-learning database for practical use and to foster further research on the UOMS problem. 1Code available at URL: https://github.com/yzhao062/UOMS
Open Datasets	Yes	1. Proof-of-Concept (POC) testbed contains 100 datasets that form clusters of similar datasets, where 5 different detection tasks ( siblings ) are created from each one of 20 mothersets . 2. Stress Testing (ST) testbed consists of 62 independent datasets from 3 different public-domain OD dataset repositories , which exhibit relatively lower similarity to one another. We use the benchmark datasets4 by Emmott et al. [11], who created childsets from 20 independent mothersets by sampling. 4https://ir.library.oregonstate.edu/concern/datasets/47429f155
Dataset Splits	Yes	We split them into 5 folds for cross-validation, each test fold containing 20 independent childsets without siblings. For evaluation on ST, we use leave-one-out cross validation; each time using 61 datasets as meta-train.
Hardware Specification	Yes	All models are built using the Py OD library [61] on an Intel i7-9700 @3.00 GHz, 64GB RAM, 8-core workstation.
Software Dependencies	No	The paper mentions "Py OD library [61]" but does not provide specific version numbers for it or any other ancillary software dependencies.
Experiment Setup	Yes	We pair 8 SOTA OD algorithms and their corresponding hyperparameters to compose a model set M with 302 unique models. (See Appendix A Table 2 for the complete list.)