reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods

Authors: Kathleen C. Fraser, Hillary Dawkins, Svetlana Kiritchenko

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this survey, we summarize stateof-the-art approaches to AIGT detection, including watermarking, statistical and stylistic analysis, and machine learning classification. We also provide information about existing datasets for this task. Synthesizing the research findings, we aim to provide insight into the salient factors that combine to determine how detectable AIGT text is under different scenarios, and to make practical recommendations for future work towards this significant technical and societal challenge.
Researcher Affiliation	Academia	Kathleen C. Fraser EMAIL Hillary Dawkins EMAIL Svetlana Kiritchenko EMAIL National Research Council Canada 1200 Montreal Road, Ottawa, Canada
Pseudocode	No	The paper describes algorithms conceptually (e.g., watermarking, statistical analysis methods) but does not include any explicit pseudocode blocks, algorithms, or structured steps formatted like code.
Open Source Code	No	The paper mentions GitHub pages for other surveys in footnotes 2 and 3, but does not provide specific access to source code for the methodologies described within this survey paper itself.
Open Datasets	Yes	Table 3 lists some of the most frequently used datasets that include human-written and AI-generated texts. We observe that the vast majority of available datasets are in English, with another sizeable chunk being multilingual (including English as well as other languages). ... Table 3: Commonly used datasets with AI-generated texts. (e.g., tum-nlp/IDMGSP (Abdalla et al., 2023), GPABench2 (Liu et al., 2024), CHEAT (Yu et al., 2025))
Dataset Splits	No	As a survey paper, this work synthesizes existing research and does not present its own experimental setup or define dataset splits for new experiments.
Hardware Specification	No	As a survey paper, this work does not involve running experiments on specific hardware, and therefore no hardware specifications are provided.
Software Dependencies	No	As a survey paper, this work synthesizes existing research and does not implement new methodologies that would require specific software dependencies.
Experiment Setup	No	As a survey paper, this work synthesizes existing research findings and does not describe a specific experimental setup, including hyperparameters or system-level training settings, for its own methodology.