reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DABL: Detecting Semantic Anomalies in Business Processes Using Large Language Models

Authors: Wei Guan, Jian Cao, Jianqi Gao, Haiyan Zhao, Shiyou Qian

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we demonstrate that DABL surpasses existing state-of-the-art semantic anomaly detection methods in terms of both generalization ability and learning of given processes.
Researcher Affiliation	Academia	Wei Guan1, Jian Cao1*, Jianqi Gao1, Haiyan Zhao2, Shiyou Qian1 1 Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China 2 Department of Computer Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology in prose and uses figures to illustrate concepts, but it does not include a dedicated pseudocode block or algorithm.
Open Source Code	Yes	Code https://github.com/guanwei49/DABL
Open Datasets	Yes	We generate normal traces by playout of the real-world process models from the BPM Academic Initiative (BPMAI) (Weske et al. 2020), fundamentals of business process management (FBPM) (Dumas et al. 2018), and SAP signavio academic models (SAP-SAM) (Sola et al. 2022).
Dataset Splits	Yes	We allocate 1,000 process models for generating the test dataset D1. These models produce 14,387 normal traces, and we randomly simulate anomalies, resulting in 13,694 anomalous traces. In total, the test dataset D1 comprises 28,081 traces. From 143,137 process models used for generating the training dataset, we randomly select 1,000 process models to create the test dataset D2. These 1,000 process models produce 21,298 normal traces, and we randomly simulate anomalies, resulting in 19,627 anomalous traces. In total, the test dataset D2 comprises 40,925 traces.
Hardware Specification	Yes	The fine-tuning is carried out on an NVIDIA A6000 GPU with 48 GB of memory.
Software Dependencies	No	The paper mentions fine-tuning the Llama 2-Chat 13B model using QLoRA and the Adam optimizer but does not specify version numbers for other ancillary software like Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup	Yes	We employ the Adam optimizer (Kingma and Ba 2014) to fine-tune the LLMs for two epochs, setting the initial learning rate to 5 10 5 with polynomial learning rate decay. The mini-batch size is set to 64.