reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Is LLMs Hallucination Usable? LLM-based Negative Reasoning for Fake News Detection

Authors: Chaowei Zhang, Zongling Feng, Zewei Zhang, Jipeng Qiang, Guandong Xu, Yun Li

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results verified on three popular fake news datasets demonstrate the superiority of our method compared with three kinds of baselines including prompting on LLMs, fine-tuning on pre-trained SLMs, and other representative fake news detection methods.
Researcher Affiliation	Academia	1Yangzhou University 2Auburn University 3The Education University of Hong Kong EMAIL, EMAIL, EMAIL, EMAIL,
Pseudocode	Yes	Algorithm 1: Self-Reinforced Reasoning Rectification Input: News item x; Label y; the initial credibility score Vinitial; the type of requested reasoning T; the alteration states S for credibility score; the pair of initial positive reasoning and credibility score (Rp, V p); the pair of initial negative reasoning and score (Rn, V n) Parameter: Polarity threshold of credibility score M; Expected incrementation for confidence level I; Maximum number of iterations Max Iter Output: The qualified reasoning {Rp, Rn}
Open Source Code	No	The paper does not provide an explicit statement or link to its source code for the methodology described. It mentions using open-source LLMs like OLlama 3 70B, Gemma 2 27B, and Mistral 7B, but this refers to external tools, not the authors' implementation code for SR3 or NRFE.
Open Datasets	Yes	In this study, we deploy three widely adopted existing fake news datasets to conduct our experiment Politifact (Wang 2017), and Twitter-15 & 16 (Yuan et al. 2019).
Dataset Splits	Yes	Each dataset is divided as 80% for training and 20% for testing.
Hardware Specification	No	The paper mentions using a 'locally deployed LLM (OLlama 3 70B)' and 'open-source LLMs (Llama 3, Gemma 2, Mistral)' but does not provide any specific details about the hardware (e.g., GPU models, CPU models, memory) used for their experiments or model training.
Software Dependencies	No	The paper mentions using 'pre-trained BERT models' and 'Adam optimizer' but does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions).
Experiment Setup	Yes	In our experiments, we set up fixed hyper-parameters including the learning rate in Adam optimizer (3 5 10), dropout rate (0.3), and the number of epochs (30 times) for both the baselines and our model.