Bagged Regularized k-Distances for Anomaly Detection

Authors: Yuchao Cai, Hanfang Yang, Yuheng Ma, Hanyuan Hang

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On the practical side, we conduct numerical experiments to illustrate the insensitivity of the parameter selection of our algorithm compared with other state-of-the-art distance-based methods. Furthermore, our method achieves superior performance on real-world datasets with the introduced bagging technique compared to other approaches. [...] Section 5 presents numerical experiments.
Researcher Affiliation Collaboration Yuchao Cai EMAIL Department of Statistics and Data Science National University of Singapore 117546, Singapore [...] Hanyuan Hang EMAIL Hong Kong Research Institute Contemporary Amperex Technology (Hong Kong) Limited Hong Kong Science Park, New Territories, Hong Kong
Pseudocode Yes Algorithm 1: Surrogate Risk Minimization (SRM) [...] Algorithm 2: Bagged Regularized k-Distances for Anomaly Detection (BRDAD)
Open Source Code No The paper does not provide an explicit statement or link to the source code for the methodology described in this paper.
Open Datasets Yes To provide an extensive experimental evaluation, we use the latest anomaly detection benchmark repository named ADBench established by Han et al. (2022).
Dataset Splits No The paper mentions categorizing datasets into small, medium, and large based on sample size and sets the number of bagging rounds (B) accordingly. It also states, "In practice, when B is fixed, we randomly divide the data into B subsets, each containing either n/B or n/B + 1 samples." However, it does not provide specific percentages or absolute counts for training, validation, and test splits for the overall experimental evaluation on the ADBench datasets.
Hardware Specification No The paper discusses computational efficiency and parallel computation but does not specify any particular hardware components (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper mentions using "the implementation of the Python package Py OD with its default parameters" for comparison methods like k-NN, LOF, and OCSVM, and "the author's implementation" for DTM and PIDForest. However, it does not specify version numbers for Python or any of these packages, which is necessary for reproducibility.
Experiment Setup Yes (i) BRDAD is our proposed algorithm, with details provided in Algorithm 2. The choice of B depends on the sample size: for n (0, 10, 000], (10, 000, 50, 000], and (50, 000, + ), we set B = 1, 5, and 10, respectively. [...] (ii) Distance-To-Measure (DTM) (Gu et al., 2019) [...] the number of neighbors k is fixed to be k = 0.03 sample size. [...] (v) Partial Identification Forest (PIDForest) (Gopalan et al., 2019) [...] with the number of trees T = 50, the number of buckets B = 5, and the depth of trees p = 10 suggested by the authors.