reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Statistical Approach for Controlled Training Data Detection

Authors: Zirui Hu, Yingjie Wang, Zheng Zhang, Hong Chen, Dacheng Tao

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical experiments on real-world datasets, such as Wiki MIA, XSum, and Real-Time BBC News, further validate KTD s superior performance compared to existing methods.
Researcher Affiliation	Academia	1Generative AI Lab, College of Computing and Data Science, Nanyang Technological University 2College of Informatics, Huazhong Agricultural University
Pseudocode	No	The paper describes procedures and theorems (e.g., Proposition 1, Proposition 2, Theorem 1, Theorem 2, Lemma 1) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code for our experiments is available at https://github.com/huzr1999/KTD
Open Datasets	Yes	We conduct our experiments on three datasets: Wiki MIA (Shi et al., 2023) includes texts collected from Wikipedia events. ... XSum (Narayan et al., 2018) includes summaries of BBC news articles. ... BBC Real Time (Li et al., 2024b) includes BBC articles from January 2017 to August 2024.
Dataset Splits	Yes	Wiki MIA ... The dataset is separated into two disjoint parts: one corresponding to events happening before 2017 and the other to events happening after 2023. These two parts are used as training samples and non-training samples, respectively. ... XSum ... We select the test set of this dataset and randomly separate it into two parts, corresponding to training and non-training samples. ... BBC Real Time ... we use the articles published in 2017 as training samples and articles published in 2024 as non-training samples.
Hardware Specification	Yes	All the experiments are run with a single NVIDIA Tesla V100 32GB GPU and a 10-core Intel Xeon (Skylake IBRS) CPU.
Software Dependencies	No	All codes are implemented with Pytorch (Paszke et al., 2019). ... All other hyperparameters were set to the default values provided by the Training Arguments class in the Transformers library.
Experiment Setup	Yes	For fine-tuning, we used the following settings: warmup step = 100 weight decay = 0.01 batch size = 8 num epochs = 3 (10 for GPT-2). ... For paraphrasing, we applied the following configurations: Top-k sampling with topk = 50 Top-p sampling with topp = 0.95 Temperature scaling with temperature = 1.9.