SCALM: Detecting Bad Practices in Smart Contracts Through LLMs

Authors: Zongwei Li, Xiaoqi Li, Wenkai Li, Xin Wang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments using multiple LLMs and datasets have shown that SCALM outperforms existing tools in detecting bad practices in smart contracts. We conduct an experimental evaluation on SCALM, and the results show that the framework performs well and outperforms existing tools in detecting bad practices in smart contracts. At the same time, ablation experiments reveal that the RAG component significantly improves SCALM performance.
Researcher Affiliation Academia Zongwei Li, Xiaoqi Li*, Wenkai Li, Xin Wang School of Cyberspace Security, Hainan University, Haikou, 570228, China
Pseudocode Yes Algorithm 1: SCALM Algorithm
Open Source Code Yes We open source SCALM s codes and experimental data at https://figshare.com/s/5cc3639706e4ecd16724.
Open Datasets Yes Our data collection comes from the DApp SCAN database (Zheng et al. 2024b), which includes 39,904 smart contracts with 1,618 SWC weaknesses. The Smartbugs dataset (Durieux et al. 2020) is also used in the experiment, and a total of 1,894 smart contracts with five types of SWC weaknesses are extracted for comparison experiments.
Dataset Splits No The paper describes the datasets used (DApp SCAN and Smartbugs) and the number of samples for certain SWC categories in the evaluation (e.g., 94 samples for positive examples of SWC-104 and 200 samples for both positive and negative examples for others). However, it does not specify explicit training, validation, or test dataset splits for training any model components within SCALM by the authors. The LLMs are used as-is or with prompting strategies, and DApp SCAN serves as a knowledge base.
Hardware Specification Yes All experiments are executed on a server equipped with NVIDIA Ge Force GTX 4070Ti GPU, Intel(R) Core(TM) i913900KF CPU, and 128G RAM, operating on Ubuntu 22.04 LTS.
Software Dependencies Yes The software environment includes Python 3.9 and Py Torch 2.0.1.
Experiment Setup No The paper describes the overall SCALM framework, the LLMs selected for experiments, and evaluation metrics (Accuracy, Recall, F1 score). However, it does not provide specific hyperparameters such as learning rates, batch sizes, optimizers, or training epochs, which are typically part of a detailed experimental setup for training a model.