reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Anomaly detection with semi-supervised classification based on risk estimators

Authors: Le Thi Khanh Hien, Sukanya Patra, Souhaib Ben Taieb

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments provide evidence of the effectiveness of the risk-based anomaly detection methods. 6 Experiments
Researcher Affiliation	Academia	Le Thi Khanh Hien EMAIL Department of Mathematics and Operational Research University of Mons, Belgium Sukanya Patra EMAIL Department of Computer Science, University of Mons, Belgium Souhaib Ben Taieb EMAIL Department of Computer Science, University of Mons, Belgium
Pseudocode	No	The paper presents optimization problems in mathematical notation (14) and (15) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not explicitly state that source code for the described methods is openly available, nor does it provide any links to a code repository.
Open Datasets	Yes	Datasets We test the algorithms on 26 classical anomaly detection benchmark datasets from Han et al. (2022), whose πn ranges from 0.02 to 0.4. Datasets We test the algorithms on 3 benchmark k-classes-out datasets: MNIST, Fashion-MNIST, and CIFAR-10 (all have 10 classes).
Dataset Splits	Yes	We randomly split each dataset 30 times into train and test data with a ratio of 7:3, i.e. we have 30 trials for each dataset. Then, for each trial, we randomly select 5% of the train data to make the labeled data and keep the remaining 95% as unlabeled data. for each πn {0.01, 0.05, 0.1, 0.2}, we set one of the ten classes to be a positive class, letting the remaining nine classes be anomalies and maintaining the ratio between normal instances and anomaly instances such that the setup has the required πn (so we have 10 setups corresponding to 10 classes). We note that the anomalous data in our generation process can originate from more than one of the nine classes (unlike in the setup of deep SAD where the anomaly is only from one of the nine classes). For each πn, we repeat this generation process 2 times to get 20 AD setups (or 20 trials). Then, in each trial, we randomly choose γl (with γl {0.05, 0.1, 0.2}) portion of the train data to be labeled and keep the remaining (1 γl) portion as unlabeled data.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU, CPU models, or cloud computing instances) used for running the experiments.
Software Dependencies	No	The paper mentions using ADAM for optimization, and refers to existing implementations for Deep SAD and nn PU, but does not provide specific version numbers for any software libraries or dependencies.
Experiment Setup	Yes	For r AD, we use l2 regularization and take ϕ(x) = x in (18), i.e. no kernel is used. We set a = 0.1 and πe p = 0.8 (πe n = 0.2) as default values for both the shallow r AD and the PU methods. For deep SAD and nn PU, we use default hyperparameter settings and network architectures as in their original implementation by the authors. We use the same network architectures as deep SAD for experiments on Fashion-MNIST and MNIST datasets. For experiments on CIFAR-10, the network architecture from nn PU is used. In deep r AD, the optimization problem in (19) is solved using ADAM. We implement 4 losses for deep r AD: squared loss, sigmoid loss, logistic loss, and modified Huber loss. We set a = 0.1 and πe p = 0.8 (thus πe n = 0.2) as default values for deep r AD.