Multi-modal Deepfake Detection via Multi-task Audio-Visual Prompt Learning
Authors: Hui Miao, Yuanfang Guo, Zeming Liu, Yunhong Wang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments demonstrate the effectiveness and superior generalization ability of our method against the stateof-the-art methods. ... Experiments Experiment Settings Datasets Implementation Details Comparisons with the Existing Methods Intra-dataset Evaluation Cross-manipulation Evaluation Cross-dataset Evaluation Ablation Study |
| Researcher Affiliation | Academia | Hui Miao, Yuanfang Guo*, Zeming Liu, Yunhong Wang School of Computer Science and Engineering, Beihang University, China EMAIL |
| Pseudocode | No | The paper describes methods using mathematical formulations and textual descriptions but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | To validate the performance of our method, we evaluate our model on Fake AVCeleb (Khalid et al. 2021b)... we also evaluate the methods on a subset of Ko DF (Kwon et al. 2021) to assess the cross-dataset generalization. |
| Dataset Splits | Yes | Specifically, the training set consists of South Asian, East Asian, and American Caucasian, the validation set contains African, and the testing set contains European Caucasian. ... In addition, by following (Feng, Chen, and Owens 2023; Oorloff et al. 2024), we also evaluate the methods on a subset of Ko DF (Kwon et al. 2021) to assess the cross-dataset generalization. |
| Hardware Specification | Yes | Note that our method is trained on a single RTX 3080ti GPU with 25G CPU memory on Ubuntu 20.04. |
| Software Dependencies | No | The paper mentions 'Ubuntu 20.04' as the operating system, but does not specify any programming languages, libraries, or other software dependencies with version numbers. |
| Experiment Setup | Yes | The number of the visual prompt tokens Nvpt is set to 1. The weights of the CMFM loss are as set to α = 2, β = 2, γ = 1. For the training process, we randomly sample segments from each video and utilize 15 training epochs with the Adam (Kingma and Ba 2014) optimizer. The initial learning rate is set to 0.0001, with a reduction by a factor of 10 occurring at the 12th epoch. |