Training-free LLM-generated Text Detection by Mining Token Probability Sequences
Authors: Yihuai Xu, Yongwei Wang, YIFEI BI, Huangsen Cao, Zhouhan Lin, Yu Zhao, Fei Wu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on six datasets involving cross-domain, cross-model, and cross-lingual detection scenarios, under both white-box and black-box settings, demonstrated that our method consistently achieves state-of-the-art performance. |
| Researcher Affiliation | Academia | 1Zhejiang University 2Georgia Institute of Technology 3Shanghai Jiao Tong University 4Zhejiang Gongshang University |
| Pseudocode | No | The paper describes the methodology with mathematical formulations and a framework diagram (Figure 2), and outlines the detection process in three steps. However, it does not present a dedicated section or block formatted as pseudocode or a clear algorithm. |
| Open Source Code | Yes | 1The code and data are released at https://github.com/Trust Media-zju/Lastde_ Detector. |
| Open Datasets | Yes | The experiments conducted involved 6 distinct datasets, covering a range of languages and topics. Adhering to the setups of Fast-Detect GPT and DNA-GPT, we report the main detection results on 4 datasets: XSum (Narayan et al., 2018) (BBC News documents), SQu AD (Rajpurkar et al., 2016; 2018) (Wikipedia-based Q&A context), Writing Prompts (Fan et al., 2018) (for story generation),and Reddit ELI5 (Fan et al., 2019) (Q&A data restricted to the topics of biology, physics, chemistry, economics, law, and technique). |
| Dataset Splits | Yes | We prefer the latter approach and have fitted logistic regression models on datasets (including Xsum, Writing Prompts, Reddit) generated by two closed-source models (GPT-4-Turbo, GPT-4o) and one open-source model (OPT-13B), reporting metrics on the test set (test size=0.2). |
| Hardware Specification | Yes | Our experimental setup consists of two RTX 3090 GPUs (2 24GB). |
| Software Dependencies | No | The paper lists various LLMs used as source and proxy models, with references to their technical reports or versions (e.g., GPT-4 (Open AI, 2024b), Gemma (Team et al., 2024), GPT-J (Wang & Komatsuzaki, 2021)). However, it does not provide specific version numbers for general software dependencies such as programming languages (e.g., Python) or libraries (e.g., PyTorch, TensorFlow, Hugging Face Transformers) used for implementing their methodology. |
| Experiment Setup | Yes | Furthermore, for Lastde, the 3 hyperparameters are set to default values of s = 3, ε = 10 n, τ = 5, where n is the number of tokens in the text. ... For Lastde++, the default settings are s = 4, ε = 8 n, τ = 15. |