reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Anomaly Detection for Tabular Data with Internal Contrastive Learning

Authors: Tom Shenkar, Lior Wolf

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that our method leads by a sizable accuracy gap in comparison to the literature and that the same default rule of hyperparameters selection provides state-of-the-art results across benchmarks.
Researcher Affiliation	Academia	Tom Shenkar & Lior Wolf Blavatnik School of Computer Science, Tel Aviv University EMAIL,EMAIL
Pseudocode	No	No pseudocode or algorithm blocks were found.
Open Source Code	Yes	The full implementation of our method and scripts for reproducing the experiments are attached as a supplementary zip ﬁle. This archive includes a README ﬁle and a list of requirements that support seamless reproducibility.
Open Datasets	Yes	The second set employs the Multi-dimensional point datasets from the Outlier Detection Data Sets (ODDS)1. It contains 31 datasets, including two of the four datasets above. 1http://odds.cs.stonybrook.edu/, accessed January 2021
Dataset Splits	No	Following Zong et al. (2018); Bergman & Hoshen (2020), the training set contains a random subset of 50% of the normal data. The test set contains the rest of the normal data, as well as all the anomalies. No explicit mention of a validation set split was found.
Hardware Specification	Yes	The google colab infrastructure was used to run the experiments. GPU: Tesla K80, 12GB GDDR5 VRAM; CPU: Single core Xeon Processors @2.3Ghz; RAM: 24 GB. For the GOAD baseline that required larger memory, we used a 32GB GPU and 512GB RAM.
Software Dependencies	No	The paper mentions a "list of requirements" in a supplementary zip file but does not explicitly list specific software dependencies with version numbers in the main text.
Experiment Setup	Yes	We ﬁx τ = 0.01 and u = 200. The value of u, which is often much larger than k, provides enough capacity throughout all experiments, without the need to tune it for each problem. We set k proportionally to the input dimension d. For d smaller than 40, we set k = 2, for d in the range [40, 160] we employ k = 10, and for d > 160, k takes the value d 150. Training employs the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 10 3 .