reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adversarial Subspace Generation for Outlier Detection in High-Dimensional Data

Authors: Jose Cribeiro-Ramallo, Federico Matteucci, Paul Enciu, Alexander Jenke, Vadim Arzamasov, Thorsten Strufe, Klemens Böhm

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on 42 real-world datasets show that using V-GAN subspaces to build ensemble methods leads to a significant increase in one-class classification performance compared to existing subspace selection, feature selection, and embedding methods. Further experiments on synthetic data show that VGAN identifies subspaces more accurately while scaling better than other relevant subspace selection methods.
Researcher Affiliation	Academia	Jose Cribeiro-Ramallo EMAIL Karlsruhe Institute of Technology Federico Matteucci EMAIL Karlsruhe Institute of Technology Paul Enciu EMAIL Karlsruhe Institute of Technology Alexander Jenke EMAIL Karlsruhe Institute of Technology Vadim Arzamasov EMAIL Karlsruhe Institute of Technology Thorsten Strufe EMAIL Karlsruhe Institute of Technology Klemens Böhm EMAIL Karlsruhe Institute of Technology
Pseudocode	Yes	A pseudo-code of the training is included in the Appendix
Open Source Code	Yes	Finally, we provide the code for all of our experiments and methods1. 1https://github.com/jcribeiro98/V-GAN
Open Datasets	Yes	We used 42 normalized datasets from the benchmark study by Han et al., listed in Tables 11-15 in the appendix. For those datasets with multiple versions, we chose the first in alphanumeric order. Details about each dataset are available in (Han et al., 2022).
Dataset Splits	Yes	1. Split the dataset D into a training set Dtrain containing 80% of the inliers from D, and a test set Dtest containing the remaining 20% and the outliers.
Hardware Specification	Yes	Experiments ran on a Ryzen 9 7900X CPU and an Nvidia RTX 4090 GPU.
Software Dependencies	No	All experiments were implemented in Python. We used popular implementations for all competitors and baselines and implemented V-GAN in Py Torch. We used the Python package pyod for all outlier detectors.
Experiment Setup	Yes	We trained the network for 2000 epochs, with minibatch gradient descent using the Adadelta optimizer (Zeiler, 2012) following preliminary results. In particular, we use batches of size 500, a learning rate of lr G = lr E = 0.007 for the generator and the encoder, respectively. We set momentum (0.99) and weight-decay (0.04) (Goodfellow et al., 2016). Additionally, we updated Eϕ once every 5 epochs.