MTVHunter: Smart Contracts Vulnerability Detection Based on Multi-Teacher Knowledge Translation
Authors: Guokai Sun, Yuan Zhuang, Shuo Zhang, Xiaoyu Feng, Zhenguang Liu, Liguo Zhang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on 229,178 real-world smart contracts that concerns four types of common vulnerabilities. Extensive experiments show MTVHunter achieves significantly performance gains over state-of-the-art approaches. |
| Researcher Affiliation | Academia | 1 College of Computer Science and Technology, Harbin Engineering University, Heilongjiang, China 2The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China 3Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security, Hangzhou, China EMAIL, EMAIL |
| Pseudocode | No | No explicit pseudocode or algorithm blocks are provided in the paper. The methodology is described in prose. |
| Open Source Code | Yes | The codes are available at https://github.com/KDSCVD/MTVHunter. |
| Open Datasets | Yes | We collected 229,178 public smart contracts from the official Ethereum website. |
| Dataset Splits | Yes | Eventually, we manually labeled the ground truth in each category by auditing the source code of contracts, and split 1627 positive contracts and 5860 negative contracts. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as CPU/GPU models or memory specifications. |
| Software Dependencies | No | Concretely, we first employ solc1 compiler to generate hexadecimal bytecode from source code, and then disassemble it into opcodes. Later, a CFG is constructed with the opcodes by an off-the-shelf symbolic execution solver, namely Octopus2. The paper does not provide specific version numbers for these tools or any other software dependencies. |
| Experiment Setup | No | While the paper discusses various losses and hyperparameters (e.g., α and β for multi-knowledge loss, number of neurons for distillation), it does not provide concrete numerical values for these hyperparameters (e.g., learning rate, batch size, specific values for α and β, number of epochs) or details about the optimizer used, which are essential for reproducing the experimental setup. |