SCALM: Detecting Bad Practices in Smart Contracts Through LLMs
Authors: Zongwei Li, Xiaoqi Li, Wenkai Li, Xin Wang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments using multiple LLMs and datasets have shown that SCALM outperforms existing tools in detecting bad practices in smart contracts. We conduct an experimental evaluation on SCALM, and the results show that the framework performs well and outperforms existing tools in detecting bad practices in smart contracts. At the same time, ablation experiments reveal that the RAG component significantly improves SCALM performance. |
| Researcher Affiliation | Academia | Zongwei Li, Xiaoqi Li*, Wenkai Li, Xin Wang School of Cyberspace Security, Hainan University, Haikou, 570228, China |
| Pseudocode | Yes | Algorithm 1: SCALM Algorithm |
| Open Source Code | Yes | We open source SCALM s codes and experimental data at https://figshare.com/s/5cc3639706e4ecd16724. |
| Open Datasets | Yes | Our data collection comes from the DApp SCAN database (Zheng et al. 2024b), which includes 39,904 smart contracts with 1,618 SWC weaknesses. The Smartbugs dataset (Durieux et al. 2020) is also used in the experiment, and a total of 1,894 smart contracts with five types of SWC weaknesses are extracted for comparison experiments. |
| Dataset Splits | No | The paper describes the datasets used (DApp SCAN and Smartbugs) and the number of samples for certain SWC categories in the evaluation (e.g., 94 samples for positive examples of SWC-104 and 200 samples for both positive and negative examples for others). However, it does not specify explicit training, validation, or test dataset splits for training any model components within SCALM by the authors. The LLMs are used as-is or with prompting strategies, and DApp SCAN serves as a knowledge base. |
| Hardware Specification | Yes | All experiments are executed on a server equipped with NVIDIA Ge Force GTX 4070Ti GPU, Intel(R) Core(TM) i913900KF CPU, and 128G RAM, operating on Ubuntu 22.04 LTS. |
| Software Dependencies | Yes | The software environment includes Python 3.9 and Py Torch 2.0.1. |
| Experiment Setup | No | The paper describes the overall SCALM framework, the LLMs selected for experiments, and evaluation metrics (Accuracy, Recall, F1 score). However, it does not provide specific hyperparameters such as learning rates, batch sizes, optimizers, or training epochs, which are typically part of a detailed experimental setup for training a model. |