reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FoMo-0D: A Foundation Model for Zero-shot Tabular Outlier Detection

Authors: Yuchen Shen, Haomin Wen, Leman Akoglu

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on 57 real-world datasets against 26 baselines show that Fo Mo-0D is highly competitive; outperforming the majority of the baselines with no statistically significant difference from the 2nd best method. Further, Fo Mo-0D is efficient in inference time requiring only 7.7 ms per sample on average, with at least 7x speedup compared to previous methods.
Researcher Affiliation	Academia	Yuchen Shen EMAIL Carnegie Mellon University Haomin Wen EMAIL Carnegie Mellon University Leman Akoglu EMAIL Carnegie Mellon University
Pseudocode	Yes	Details are outlined in Algorithm 1 in Appendix C, and described as follows. At each time, we first draw a hypothesis (i.e. GMM configuration) uniformly at random, that is, ϕ = {d [D], m [M], {µj}m j=1 [ 5, 5]d, {Σj}m j=1; diag(Σj) [ 5, 5]d}, and then generate a synthetic dataset D = {Din, Dout} containing synthetic inlier and outlier samples from the drawn hypothesis and its variance-inflated variant, respectively. We optimize Fo Mo-0D s parameters θ to make predictions on Dtest = {Din test, Dout test}, conditioned on the inlier-only training data Dtrain Din based on the cross-entropy loss (see Eq. (2)). During training, Dtest contains a balanced number of inlier and outlier samples, where Din test = Din\Dtrain, and Dout test Dout contains an equal number of samples as Din test. To vary the training data size, we subsample Dtrain of randomly drawn size n [n L, n U], where n L and n U denote the lower and upper bounds. In our implementation, we use n L = 500, and n U = 5, 000. Fo Mo-0D is trained on 200, 000 batches (200 epochs × 1, 000 steps/epoch) of B = 8 generated datasets in each batch. While this pre-training phase can be expensive, it is done only once, offline. Moreover, we introduce several scalability improvements to speed up pre-training, as discussed later in Section 3.3. Full details on the training and implementation of Fo Mo-0D are given in Appendix C.
Open Source Code	Yes	To facilitate future research, our implementations for data synthesis and pre-training as well as model checkpoints are openly available at https://github.com/A-Chicharito-S/Fo Mo-0D.
Open Datasets	Yes	While pre-training is purely on synthetic datasets, we evaluate Fo Mo-0D on 57 real-world datasets from ADBench (Han et al., 2022) (see Table 20 in Appendix J).
Dataset Splits	Yes	Following Livernoche et al. (2024), we use 5 train/test splits of each dataset via different seeds and report mean performance and standard deviation. In particular, each random split designates 50% of the inliers as Dtrain, while Dtest contains the rest of the inliers and all the outlier samples.
Hardware Specification	Yes	We base our experiments on a NVIDIA RTX A6000 GPU with AMD EPYC 7742 64-Core Processors.
Software Dependencies	No	We train our models for 200 epochs with the Adam optimizer (Kingma & Ba, 2017) and a learning_rate = 0.001, and test with the model corresponding to the lowest training loss.
Experiment Setup	Yes	We train our models for 200 epochs with the Adam optimizer (Kingma & Ba, 2017) and a learning_rate = 0.001, and test with the model corresponding to the lowest training loss. The size of our D = {20, 100} model is 4.87M and 4.89M parameters, respectively. ... Model architecture We use a 4-layer Transformer with hidden dimension h_dim = 256, a linear embedding layer at the input (RD Rh_dim), and a 2-layer MLP layer at the output (Rh_dim R2) for inlier vs. outlier binary classification. For each Transformer layer, we use num_head = 4 for each attention module and R = 500 for the router-based attention (Figure 2).