Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods

Authors: Kathleen C. Fraser, Hillary Dawkins, Svetlana Kiritchenko

JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this survey, we summarize stateof-the-art approaches to AIGT detection, including watermarking, statistical and stylistic analysis, and machine learning classification. We also provide information about existing datasets for this task. Synthesizing the research findings, we aim to provide insight into the salient factors that combine to determine how detectable AIGT text is under different scenarios, and to make practical recommendations for future work towards this significant technical and societal challenge.
Researcher Affiliation Academia Kathleen C. Fraser EMAIL Hillary Dawkins EMAIL Svetlana Kiritchenko EMAIL National Research Council Canada 1200 Montreal Road, Ottawa, Canada
Pseudocode No The paper describes algorithms conceptually (e.g., watermarking, statistical analysis methods) but does not include any explicit pseudocode blocks, algorithms, or structured steps formatted like code.
Open Source Code No The paper mentions GitHub pages for other surveys in footnotes 2 and 3, but does not provide specific access to source code for the methodologies described within this survey paper itself.
Open Datasets Yes Table 3 lists some of the most frequently used datasets that include human-written and AI-generated texts. We observe that the vast majority of available datasets are in English, with another sizeable chunk being multilingual (including English as well as other languages). ... Table 3: Commonly used datasets with AI-generated texts. (e.g., tum-nlp/IDMGSP (Abdalla et al., 2023), GPABench2 (Liu et al., 2024), CHEAT (Yu et al., 2025))
Dataset Splits No As a survey paper, this work synthesizes existing research and does not present its own experimental setup or define dataset splits for new experiments.
Hardware Specification No As a survey paper, this work does not involve running experiments on specific hardware, and therefore no hardware specifications are provided.
Software Dependencies No As a survey paper, this work synthesizes existing research and does not implement new methodologies that would require specific software dependencies.
Experiment Setup No As a survey paper, this work synthesizes existing research findings and does not describe a specific experimental setup, including hyperparameters or system-level training settings, for its own methodology.