Beyond Text: Fine-Grained Multi-Modal Fact Verification with Hypergraph Transformers

Authors: Hui Pang, Chaozhuo Li, Litian Zhang, Senzhang Wang, Xi Zhang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on benchmark datasets demonstrate that our model significantly outperforms existing approaches in multi-modal fact verification. We evaluate our HGTMFC model against ten baseline methods, including five text-only approaches, two image-only approaches and three multi-modal based approaches. The primary experimental results are summarized in Table 1.
Researcher Affiliation Academia 1Key Laboratory of Trustworthy Distributed Computing and Service (Mo E), Beijing University of Posts and Telecommunications, China 2School of Cyber Science and Technology, Beihang University, China 3School of Computer Science and Engineering, Central South University, China EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology in prose with figures (Figure 2 and Figure 3) and mathematical equations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing source code for the methodology, nor does it include a link to a code repository.
Open Datasets Yes In our experiments, due to the limited availability of relevant datasets, we selected the Mocheg dataset (Yao et al. 2023) to analyze multi-modal fact-checking. The dataset includes 15,601 claims from Politi Fact and Snopes, labeled for truthfulness and supported by verified ruling statements. Mocheg provides a diverse evidence base with 33,880 textual paragraphs and 12,112 images, making it ideal for multi-modal fact-checking.
Dataset Splits Yes The dataset is split into training, validation, and test sets, with 11669, 1490 and 2442 samples respectively and labels available for the first two.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running its experiments. It only mentions that all models were developed using Py Torch.
Software Dependencies No Our implementation was applied to both retrieved evidence and gold evidence, with all models developed using Py Torch. For text evidence retrieval, we utilize the SBERT model (Reimers and Gurevych 2019)... In the case of image retrieval, the CLIP model (Radford et al. 2021) generates feature representations... We employed the Adam optimizer...
Experiment Setup Yes The CLIP model s maximum text length was set to 77 tokens, and images were processed after cropping to 16x16 pixels. We employed the Adam optimizer with a learning rate of 2 10 6, a weight decay of 1 10 5, and a batch size of 16. Random node sampling was set to 50, the residual connection weight β was configured at 0.5, the reference set size p was established at 6, and the number of hypergraph propagation layers was set to 2. Early stopping was applied when the validation loss did not decrease within 20 epochs, with a maximum of 50 epochs allowed for training.