Scalable Approximate Bayesian Inference for Outlier Detection under Informative Sampling
Authors: Terrance D. Savitsky
JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide a simulation study to demonstrate that our approach produces unbiased estimation for the outlying cluster under informative sampling. The method is applied for outlier nomination for the Current Employment Statistics survey conducted by the Bureau of Labor Statistics. |
| Researcher Affiliation | Academia | Terrance D. Savitsky EMAIL U. S. Bureau of Labor Statistics Office of Survey Methods Research Washington, DC 20212, USA |
| Pseudocode | Yes | Appendix A. Hierarchical Clustering Algorithm Loop over algorithm blocks, A.2 and A.3 until convergence. Algorithm A.1: Initialize local and global cluster objects Algorithm A.2: Build Local and Global Clusters Algorithm A.3: Merge global clusters |
| Open Source Code | No | We implement both algorithms we present, below, in the growclusters package for R(R Core Team 2014), which is written in C++ for fast computation and available from the authors on request. |
| Open Datasets | No | The U.S. Bureau of Labor Statistics (BLS) administers the Current Employment Statistics (CES) survey to over 350000 non-farm, public and private business establishments across the U.S. on a monthly basis, receiving approximately 270000 submitted responses in each month. |
| Dataset Splits | Yes | Observations are next randomly allocated into two sets of equal size; one used to train the model and the other to evaluate the resultant energy. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | We implement both algorithms we present, below, in the growclusters package for R(R Core Team 2014), which is written in C++ for fast computation and available from the authors on request. |
| Experiment Setup | Yes | We chose the values of (λL = 1232, λK = 2254) that maximized the C index for our sampling-weighted hierarchical clustering model. |