reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DRL: Decomposed Representation Learning for Tabular Anomaly Detection

Authors: Hangting Ye, He Zhao, Wei Fan, Mingyuan Zhou, Dandan Guo, Yi Chang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, extensive experiments on 40 tabular datasets and 16 competing tabular anomaly detection algorithms show that our method achieves state-of-the-art performance.
Researcher Affiliation	Collaboration	Hangting Ye1, He Zhao2, Wei Fan3, Mingyuan Zhou4, Dandan Guo1 , Yi Chang1 5 6 School of Artificial Intelligence, Jilin University1 CSIRO s Data612 University of Oxford3 The University of Texas at Austin4 International Center of Future Science, Jilin University5 Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China6 EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	We provide the training and inference process of DRL in Algorithm 1 and 2 of Appendix A.3 respectively.
Open Source Code	Yes	1The source code is available at https://github.com/Hangting Ye/DRL.
Open Datasets	Yes	We conduct experiments on an extensive benchmark of 40 tabular anomaly detection datasets selected from Outlier Detection Data Sets (ODDS) (Rayana, 2016) and Anomaly Detection Benchmark (ADBench) (Han et al., 2022), following previous works (Yin et al., 2024; Thimonier et al., 2024).
Dataset Splits	Yes	Per the literature (Zong et al., 2018; Bergman & Hoshen, 2020; Yin et al., 2024; Thimonier et al., 2024), we construct the training set by randomly subsampling 50% of the normal samples. The remaining 50% of the normal samples are then combined with the entire set of anomalies to form the test set.
Hardware Specification	Yes	We provide the runtime in seconds of DRL for the training and inference phase on a single GTX 3090 GPU, as shown in Table 7.
Software Dependencies	No	The paper mentions 'Adam optimizer is employed' but does not specify versions for any programming languages, libraries, or other software components.
Experiment Setup	Yes	The DRL architecture remains consistent across all datasets. Specifically, the feature extractor f( ; θf) : RD RE and weight learner ϕ( ; θϕ) : RD RK are implemented as a simple two-layer fully connected MLP with Leaky Re LU activation function. The last layer of weight learner is with Softmax activation function. The alignment learner g( ; θg) : RE RD is a linear layer. For the distance measurement d( , ), we use L2 distance for Ldecomposition (Eq. 3), Cosine distance for Lseparation (Eq. 4) and Lalignment (Eq. 5). The default number of basis vectors K is set to 5, and these basis vectors are not updated during training. λ1, λ2 is set to 0.06 and 0.1 for separation loss and alignment loss respectively. The hidden dimension E is set to 128, the batch size is set to 512, and the number of epochs is set to 200. Adam optimizer is employed, bounded by an exponentially decaying learning rate controller with 0.05 as initialization. To reduce the effect of randomness, the reported performance is averaged over 10 independent runs.