Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods
Authors: Kathleen C. Fraser, Hillary Dawkins, Svetlana Kiritchenko
JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this survey, we summarize stateof-the-art approaches to AIGT detection, including watermarking, statistical and stylistic analysis, and machine learning classification. We also provide information about existing datasets for this task. Synthesizing the research findings, we aim to provide insight into the salient factors that combine to determine how detectable AIGT text is under different scenarios, and to make practical recommendations for future work towards this significant technical and societal challenge. |
| Researcher Affiliation | Academia | Kathleen C. Fraser EMAIL Hillary Dawkins EMAIL Svetlana Kiritchenko EMAIL National Research Council Canada 1200 Montreal Road, Ottawa, Canada |
| Pseudocode | No | The paper describes algorithms conceptually (e.g., watermarking, statistical analysis methods) but does not include any explicit pseudocode blocks, algorithms, or structured steps formatted like code. |
| Open Source Code | No | The paper mentions GitHub pages for other surveys in footnotes 2 and 3, but does not provide specific access to source code for the methodologies described within this survey paper itself. |
| Open Datasets | Yes | Table 3 lists some of the most frequently used datasets that include human-written and AI-generated texts. We observe that the vast majority of available datasets are in English, with another sizeable chunk being multilingual (including English as well as other languages). ... Table 3: Commonly used datasets with AI-generated texts. (e.g., tum-nlp/IDMGSP (Abdalla et al., 2023), GPABench2 (Liu et al., 2024), CHEAT (Yu et al., 2025)) |
| Dataset Splits | No | As a survey paper, this work synthesizes existing research and does not present its own experimental setup or define dataset splits for new experiments. |
| Hardware Specification | No | As a survey paper, this work does not involve running experiments on specific hardware, and therefore no hardware specifications are provided. |
| Software Dependencies | No | As a survey paper, this work synthesizes existing research and does not implement new methodologies that would require specific software dependencies. |
| Experiment Setup | No | As a survey paper, this work synthesizes existing research findings and does not describe a specific experimental setup, including hyperparameters or system-level training settings, for its own methodology. |