Automated Detection of Pre-training Text in Black-box LLMs

Authors: Ruihan Hu, Yu-Ming Shang, Jiankun Peng, Wei Luo, Yazhe Wang, Xi Zhang

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations on three widely used datasets demonstrate that our framework is effective and superior in the black-box setting. Section 5 is titled "Experiments" and describes the experimental setup, datasets, and results.
Researcher Affiliation Academia 1Key Laboratory of Trustworthy Distributed Computing and Service (Mo E), Beijing University of Posts and Telecommunications, China 2Zhongguancun Laboratory, China EMAIL, EMAIL, EMAIL. The email domains 'bupt.edu.cn' and 'zgclab.edu.cn' indicate academic or public research affiliations.
Pseudocode No The paper describes methods using text and figures, but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Code: github.com/STAIR-BUPT/Veil Probe Corresponding Author
Open Datasets Yes Wiki MIA [Shi et al., 2024] consists of Wikipedia event snippets. Book Tection [Duarte et al., 2024] is a widely adopted dataset that contains 165 copyrighted books, which is expanded based on Book MIA [Shi et al., 2024]. ar Xiv Tection [Duarte et al., 2024] includes classic papers from ar Xiv.
Dataset Splits Yes We randomly sample approximately 50 ground-truth samples per dataset to train the prototype-based classifier, with the remaining samples serving as the texts to be detected.
Hardware Specification Yes In our work, all experiments are implemented on a workstation with five NVIDIA Tesla V100 32G GPUs, and Ubuntu22.04.4.
Software Dependencies No The paper mentions 'Ubuntu22.04.4' as the operating system, but does not provide specific versions for key software components or libraries (e.g., Python, PyTorch, CUDA, etc.) used in the experiments.
Experiment Setup Yes For each text to be detected, three suffixes were generated using the target LLM, with the maximum suffix length set to 512 tokens. The parameter γ is set to 10 for obtaining the perturbed text r. The p-value significance threshold is chosen from {0.001, 0.01, 0.05, 0.1} to select the critical perturbation calibration features. We randomly sample approximately 50 ground-truth samples per dataset to train the prototype-based classifier.