MTVHunter: Smart Contracts Vulnerability Detection Based on Multi-Teacher Knowledge Translation

Authors: Guokai Sun, Yuan Zhuang, Shuo Zhang, Xiaoyu Feng, Zhenguang Liu, Liguo Zhang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on 229,178 real-world smart contracts that concerns four types of common vulnerabilities. Extensive experiments show MTVHunter achieves significantly performance gains over state-of-the-art approaches.
Researcher Affiliation Academia 1 College of Computer Science and Technology, Harbin Engineering University, Heilongjiang, China 2The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China 3Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security, Hangzhou, China EMAIL, EMAIL
Pseudocode No No explicit pseudocode or algorithm blocks are provided in the paper. The methodology is described in prose.
Open Source Code Yes The codes are available at https://github.com/KDSCVD/MTVHunter.
Open Datasets Yes We collected 229,178 public smart contracts from the official Ethereum website.
Dataset Splits Yes Eventually, we manually labeled the ground truth in each category by auditing the source code of contracts, and split 1627 positive contracts and 5860 negative contracts.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as CPU/GPU models or memory specifications.
Software Dependencies No Concretely, we first employ solc1 compiler to generate hexadecimal bytecode from source code, and then disassemble it into opcodes. Later, a CFG is constructed with the opcodes by an off-the-shelf symbolic execution solver, namely Octopus2. The paper does not provide specific version numbers for these tools or any other software dependencies.
Experiment Setup No While the paper discusses various losses and hyperparameters (e.g., α and β for multi-knowledge loss, number of neurons for distillation), it does not provide concrete numerical values for these hyperparameters (e.g., learning rate, batch size, specific values for α and β, number of epochs) or details about the optimizer used, which are essential for reproducing the experimental setup.