reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Misinformation Detection by Visiting Potential Commonsense Conflict

Authors: Bing Wang, Ximing Li, Changchun Li, Bingrui Zhao, Bo Fu, Renchu Guan, Shengsheng Wang

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We integrate MD-PCC with various existing MD backbones and compare them across 4 public benchmark datasets and Co Mis. Empirical results demonstrate that MD-PCC can consistently outperform the existing MD baselines. The source code and data of MD-PCC are released in the repository https://github.com/wangbing1416/MD-PCC.
Researcher Affiliation	Academia	1College of Computer Science and Technology, Jilin University 2Key Laboratory of Symbolic Computation and Knowledge Engineering of the Mo E, Jilin University 3School of Computer Science and Artificial Intelligence, Liaoning Normal University EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Training summary of MD-PCC.
Open Source Code	Yes	The source code and data of MD-PCC are released in the repository https://github.com/wangbing1416/MD-PCC.
Open Datasets	Yes	For empirical evaluations, we employ 4 public benchmark datasets Gossip Cop [Shu et al., 2020], Weibo [Sheng et al., 2022], Politi Fact [Shu et al., 2020] and Snopes [Popat et al., 2017]. Additionally, we further collect a new Commonsense-oriented Misinformation benchmark datasets, named Co Mis, whose all fake articles are caused by commonsense conflict. ... The source code and data of MD-PCC are released in the repository https://github.com/wangbing1416/MD-PCC.
Dataset Splits	Yes	Table 2: Statistics of prevalent FND datasets and Co Mis. Dataset # Train # Val. # Test Fake Real Fake Real Fake Real Weibo 2,561 7,660 499 1,918 754 2,957 Gossip Cop 2,024 5,039 604 1,774 601 1,758 Politi Fact 1,224 1,344 170 186 307 337 Snopes 2,288 838 317 116 572 210 Co Mis 560 440 170 125 162 123
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	In our experiments, we employ pre-trained language models Flan T5Large4 [Chung et al., 2024] and m T5Large5 [Xue et al., 2021] to extract commonsense triplets for the English and Chinese MD datasets, respectively. To generate golden objects, we use COMETATOMIC20 20 6 [Hwang et al., 2021] for English datasets and comet-atomic-zh7 for Chinese datasets Weibo and Co Mis. During the training stage, we use an Adam optimizer with a learning rate of 7 × 10−5 for the BERT model in baseline methods. For the other modules such as the linear classifier, we use a learning rate of 1 × 10−4, and the batch size is consistently fixed to 64.
Experiment Setup	Yes	During the training stage, we use an Adam optimizer with a learning rate of 7 × 10−5 for the BERT model in baseline methods. For the other modules such as the linear classifier, we use a learning rate of 1 × 10−4, and the batch size is consistently fixed to 64. We also fix some other manual parameters empirically, such as K, ϵ, and µ to 5, 0.8, and 0.6, respectively. To avoid overfitting of detectors, we adopt an early stop strategy. This means that the training stage will stop when no better Macro F1 value appears for 10 epochs.