Scalable Approximate Bayesian Inference for Outlier Detection under Informative Sampling

Authors: Terrance D. Savitsky

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide a simulation study to demonstrate that our approach produces unbiased estimation for the outlying cluster under informative sampling. The method is applied for outlier nomination for the Current Employment Statistics survey conducted by the Bureau of Labor Statistics.
Researcher Affiliation Academia Terrance D. Savitsky EMAIL U. S. Bureau of Labor Statistics Office of Survey Methods Research Washington, DC 20212, USA
Pseudocode Yes Appendix A. Hierarchical Clustering Algorithm Loop over algorithm blocks, A.2 and A.3 until convergence. Algorithm A.1: Initialize local and global cluster objects Algorithm A.2: Build Local and Global Clusters Algorithm A.3: Merge global clusters
Open Source Code No We implement both algorithms we present, below, in the growclusters package for R(R Core Team 2014), which is written in C++ for fast computation and available from the authors on request.
Open Datasets No The U.S. Bureau of Labor Statistics (BLS) administers the Current Employment Statistics (CES) survey to over 350000 non-farm, public and private business establishments across the U.S. on a monthly basis, receiving approximately 270000 submitted responses in each month.
Dataset Splits Yes Observations are next randomly allocated into two sets of equal size; one used to train the model and the other to evaluate the resultant energy.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No We implement both algorithms we present, below, in the growclusters package for R(R Core Team 2014), which is written in C++ for fast computation and available from the authors on request.
Experiment Setup Yes We chose the values of (λL = 1232, λK = 2254) that maximized the C index for our sampling-weighted hierarchical clustering model.