Beyond Text: Fine-Grained Multi-Modal Fact Verification with Hypergraph Transformers
Authors: Hui Pang, Chaozhuo Li, Litian Zhang, Senzhang Wang, Xi Zhang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on benchmark datasets demonstrate that our model significantly outperforms existing approaches in multi-modal fact verification. We evaluate our HGTMFC model against ten baseline methods, including five text-only approaches, two image-only approaches and three multi-modal based approaches. The primary experimental results are summarized in Table 1. |
| Researcher Affiliation | Academia | 1Key Laboratory of Trustworthy Distributed Computing and Service (Mo E), Beijing University of Posts and Telecommunications, China 2School of Cyber Science and Technology, Beihang University, China 3School of Computer Science and Engineering, Central South University, China EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology in prose with figures (Figure 2 and Figure 3) and mathematical equations, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code for the methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | In our experiments, due to the limited availability of relevant datasets, we selected the Mocheg dataset (Yao et al. 2023) to analyze multi-modal fact-checking. The dataset includes 15,601 claims from Politi Fact and Snopes, labeled for truthfulness and supported by verified ruling statements. Mocheg provides a diverse evidence base with 33,880 textual paragraphs and 12,112 images, making it ideal for multi-modal fact-checking. |
| Dataset Splits | Yes | The dataset is split into training, validation, and test sets, with 11669, 1490 and 2442 samples respectively and labels available for the first two. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running its experiments. It only mentions that all models were developed using Py Torch. |
| Software Dependencies | No | Our implementation was applied to both retrieved evidence and gold evidence, with all models developed using Py Torch. For text evidence retrieval, we utilize the SBERT model (Reimers and Gurevych 2019)... In the case of image retrieval, the CLIP model (Radford et al. 2021) generates feature representations... We employed the Adam optimizer... |
| Experiment Setup | Yes | The CLIP model s maximum text length was set to 77 tokens, and images were processed after cropping to 16x16 pixels. We employed the Adam optimizer with a learning rate of 2 10 6, a weight decay of 1 10 5, and a batch size of 16. Random node sampling was set to 50, the residual connection weight β was configured at 0.5, the reference set size p was established at 6, and the number of hypergraph propagation layers was set to 2. Early stopping was applied when the validation loss did not decrease within 20 epochs, with a maximum of 50 epochs allowed for training. |