Training-free LLM-generated Text Detection by Mining Token Probability Sequences

Authors: Yihuai Xu, Yongwei Wang, YIFEI BI, Huangsen Cao, Zhouhan Lin, Yu Zhao, Fei Wu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on six datasets involving cross-domain, cross-model, and cross-lingual detection scenarios, under both white-box and black-box settings, demonstrated that our method consistently achieves state-of-the-art performance.
Researcher Affiliation Academia 1Zhejiang University 2Georgia Institute of Technology 3Shanghai Jiao Tong University 4Zhejiang Gongshang University
Pseudocode No The paper describes the methodology with mathematical formulations and a framework diagram (Figure 2), and outlines the detection process in three steps. However, it does not present a dedicated section or block formatted as pseudocode or a clear algorithm.
Open Source Code Yes 1The code and data are released at https://github.com/Trust Media-zju/Lastde_ Detector.
Open Datasets Yes The experiments conducted involved 6 distinct datasets, covering a range of languages and topics. Adhering to the setups of Fast-Detect GPT and DNA-GPT, we report the main detection results on 4 datasets: XSum (Narayan et al., 2018) (BBC News documents), SQu AD (Rajpurkar et al., 2016; 2018) (Wikipedia-based Q&A context), Writing Prompts (Fan et al., 2018) (for story generation),and Reddit ELI5 (Fan et al., 2019) (Q&A data restricted to the topics of biology, physics, chemistry, economics, law, and technique).
Dataset Splits Yes We prefer the latter approach and have fitted logistic regression models on datasets (including Xsum, Writing Prompts, Reddit) generated by two closed-source models (GPT-4-Turbo, GPT-4o) and one open-source model (OPT-13B), reporting metrics on the test set (test size=0.2).
Hardware Specification Yes Our experimental setup consists of two RTX 3090 GPUs (2 24GB).
Software Dependencies No The paper lists various LLMs used as source and proxy models, with references to their technical reports or versions (e.g., GPT-4 (Open AI, 2024b), Gemma (Team et al., 2024), GPT-J (Wang & Komatsuzaki, 2021)). However, it does not provide specific version numbers for general software dependencies such as programming languages (e.g., Python) or libraries (e.g., PyTorch, TensorFlow, Hugging Face Transformers) used for implementing their methodology.
Experiment Setup Yes Furthermore, for Lastde, the 3 hyperparameters are set to default values of s = 3, ε = 10 n, τ = 5, where n is the number of tokens in the text. ... For Lastde++, the default settings are s = 4, ε = 8 n, τ = 15.