Deep Kernel Relative Test for Machine-generated Text Detection
Authors: Yiliao Song, Zhenqiao Yuan, Shuhai Zhang, Zhen Fang, Jun Yu, Feng Liu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the superior performance of our method, compared to state-of-the-art non-parametric and parametric detectors. |
| Researcher Affiliation | Academia | School of Computer and Mathematical Sciences, The University of Adelaide, Adelaide, AU1 School of Computing and Information Systems, University of Melbourne, Melbourne, AU2 School of Software Engineering, South China University of Technology, Guangzhou, CN3 Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney, AU4 School of Intelligence Science and Engineering, Harbin Institute of Technology, Shenzhen, CN5 |
| Pseudocode | Yes | Algorithm 1 Relative Test MGT Detection |
| Open Source Code | Yes | The code and demo are available: https://github.com/x Learn-AU/R-Detect. |
| Open Datasets | Yes | We design our experiments on data from five benchmarks: HC3 (Guo et al., 2023), Truthful QA (TQA) (He et al., 2023; Lin et al., 2022), RAID (Dugan et al., 2024), and Detect RL (Wu et al., 2024). |
| Dataset Splits | Yes | In the default setting, we randomly take 512 tokens and repeat the experiments 10 × 10 times given a specific experimental design. During each round of detection in section 4.2 and section 4.3, we first shuffle the HC3 dataset and select the first 512 tokens from HWTs and the first 512 tokens from MGTs as the text to be tested (the token number will be 256 in the token-256 experiments). The default reference data will be the rest of the data. |
| Hardware Specification | Yes | We conduct our experiments using Python 3.9 and Pytorch 2.0 on a server with Intel Core i9 14900K and RTX 4090. |
| Software Dependencies | Yes | We conduct our experiments using Python 3.9 and Pytorch 2.0 on a server with Intel Core i9 14900K and RTX 4090. |
| Experiment Setup | Yes | In Algorithm 3, we use Adam optimizer (Kingma & Ba, 2015) to optimize the deep kernel parameters, we set λ to 10⁻⁸ and batch size to 200, and the learning rate to 0.00005 in all experiments. The default threshold of the hypothesis test both two-sample test or relative test is α = 0.05 to determine whether to reject or accept the null hypothesis. |