reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

EVICheck: Evidence-Driven Independent Reasoning and Combined Verification Method for Fact-Checking

Authors: Lingxiao Wang, Lei Shi, Feifei Kou, Ligu Zhu, Chen Ma, Pengfei Zhang, Mingying Xu, Zeyu Li

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on the public RAWFC dataset demonstrate that EVICheck achieves state-of-the-art performance across all evaluation metrics. Our method demonstrates strong potential in fake news verification, significantly improving the accuracy. ... Section 4 Experimental Setting ... Section 5 Experimental Results
Researcher Affiliation	Academia	1State Key Laboratory of Media Convergence and Communication, Communication University of China 2School of Computer Science (National Pilot School of Software Engineering), BUPT 3Key Laboratory of Trustworthy Distributed Computing and Service, BUPT Ministry of Education 4Institute of Cyberspace Security, Zhejiang University of Technology 5State Key Laboratory of Digital Intelligent Technology for Unmanned Coal Mining, the School of Computer Science and Engineering, Anhui University of Science and Technology 6School of Artificial Intelligence and Computer Science, North China University of Technology EMAIL
Pseudocode	Yes	Algorithm 1 Multi-Round Reasoning and Validation Input: claim x, the number of iterations max loops. Output: Final Prediction yˆ, Explanation e. {qi}m i Generate Questions(x) qbest Select Best Question({qi}m i , x) Evidence Set [] counter 0 while counter < max loops do w Retrieve Web Content(qbest) yˆcurrent, ecurrent Reasoning(w, qbest, x) Evidence Set.append({yˆcurrent, ecurrent}) {q i}m i Gen Follow Q(yˆcurrent, ecurrent, x) qbest Select Best Question({q i}m i , x) counter counter + 1 end while (yˆ, e) Combined Validate(Evidence Set, S) return yˆ, e
Open Source Code	No	The paper mentions using the LLa MA-Factory framework (with a link provided in a footnote) for fine-tuning, but does not state that the code for the EVICheck methodology itself is open-source or provide a link to its own implementation.
Open Datasets	Yes	Dataset. We adopt the English fake news dataset RAWFC [Yang et al., 2022] for experiments. The dataset was created by collecting claims from Snopes4 and retrieving the relevant raw reports. It includes three categories of labels: True, False, and Half, with each data entry provided with a manually annotated golden label explanation. The data distribution is shown in Table 2.
Dataset Splits	Yes	Table 2: RAWFC data statistics. false half true total train 514 537 561 1612 test 66 67 67 200 validation 66 67 67 200
Hardware Specification	No	The paper does not provide specific details about the hardware used, such as GPU or CPU models. It mentions using models like GPT-3.5 and GPT-4, but these are language models, not hardware specifications.
Software Dependencies	No	The paper mentions using the LLa MA-Factory framework (citing [Zheng et al., 2024]) and Serp Api, but it does not provide specific version numbers for these software components. The models GPT-3.5, GPT-4, and Llama-3-8B-Instruct are mentioned, but these are models, not versioned software dependencies in the sense of libraries or tools for replication.
Experiment Setup	Yes	To reduce computational overhead was conducted based on M = 5 validation questions and performing N = 2 rounds of loop inference. ... The fine-tuning was performed for three epochs.