BotSim: LLM-Powered Malicious Social Botnet Simulation

Authors: Boyu Qiao, Kun Li, Wei Zhou, Shilong Li, Qianqian Lu, Songlin Hu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results indicate that detection methods effective on traditional bot datasets perform worse on Bot Sim-24, highlighting the urgent need for new detection strategies to address the cybersecurity threats posed by these advanced bots.
Researcher Affiliation Academia 1Institute of Information Engineering, Chinese Academy of Sciences 2School of Cyber Security, University of Chinese Academy of Sciences EMAIL
Pseudocode Yes A complete prompt example is provided in Appendix B.4, and the algorithm for this execution process is further explained in Appendix B.5.
Open Source Code Yes Code https://github.com/QQQQQQBY/Bot Sim
Open Datasets No Bot Sim-24: LLM-driven Bot Detection Dataset In this section, we present Bot Sim-24, a bot detection dataset powered by LLM. Building on the Bot Sim framework, we simulate information dissemination and user interactions across six Sub Reddits on Reddit. This process results in the creation of the Bot Sim-24 dataset, which includes 1,907 human accounts and 1,000 LLM-driven agent bot accounts. [...] The Bot Sim-24 dataset does not include interactions between humans and bots. Statistics in Table 4 show that such interactions are also relatively sparse in actual OSNs. However, as LLM-powered bots become more prevalent, their high human-like characteristics will inevitably lead to an increase in human-bot (human bot) interactions. As demonstrated by our edge perturbation experiments, this trend will challenge and undermine the effectiveness of GNN-based methods. Furthermore, Table 5 offers a detailed overview of the performance of various LLMs in account detection tasks based on textual content. Additionally, Figure 4 in Appendix A.5 visually illustrates findings on the accuracy of human annotators. These results highlight the difficulty LLMs face distinguishing between text they generate and text authored by humans. Human annotators also struggle to achieve high accuracy in this regard. For additional details, please refer to Appendix A.5. This underscores the critical challenge of detecting LLM-driven bots and emphasizes the urgent need for innovative detection strategies to keep pace with their evolving capabilities.
Dataset Splits Yes Consistent with the division used in Twi Bot-20 and MGTAB-22, we randomly divide all datasets into training, validation, and test sets with a ratio of 7:2:1. Table 2 shows the division of the Bot Sim-24 dataset.
Hardware Specification Yes Our experiments are conducted on four Tesla V100 GPUs with 32GB of memory.
Software Dependencies No Detailed hyperparameter settings can be found in Appendix A.1.
Experiment Setup No Detailed hyperparameter settings can be found in Appendix A.1.