Enabling Users to Falsify Deepfake Attacks
Authors: Tal Reiss, Bar Cavia, Yedid Hoshen
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we demonstrate that FACTOR significantly outperforms state-of-the-art deepfake detection techniques despite being simple to implement and not relying on any fake data for pretraining. Our code is available at https://github.com/talreiss/FACTOR.We conducted experiments on three face swapping datasets that provide identity-related information: Celeb-DF (Li et al., 2020), DFD (Research et al.) (which is part of FF++ (Rössler et al., 2019)), and DFDC (Dolhansky et al., 2019). The results, presented in Tab. 1, show that while the supervised baselines performed poorly on previously unseen attack scenarios, our method achieved near-perfect accuracy. |
| Researcher Affiliation | Academia | Tal Reiss EMAIL School of Computer Science and Engineering The Hebrew University of Jerusalem, Israel Bar Cavia EMAIL School of Computer Science and Engineering The Hebrew University of Jerusalem, Israel Yedid Hoshen EMAIL School of Computer Science and Engineering The Hebrew University of Jerusalem, Israel |
| Pseudocode | No | The paper only describes steps in regular paragraph text without structured formatting. It provides mathematical formulas (Eq. 1, 2, 3) and descriptive text for the FACTOR method, but no explicit pseudocode or algorithm block. |
| Open Source Code | Yes | Our code is available at https://github.com/talreiss/FACTOR. |
| Open Datasets | Yes | We conducted experiments on three face swapping datasets that provide identity-related information: Celeb-DF (Li et al., 2020), DFD (Research et al.) (which is part of FF++ (Rössler et al., 2019)), and DFDC (Dolhansky et al., 2019). We evaluated our method on the Fake AVCeleb video forensics dataset (Khalid et al., 2021). Our evaluation is performed on a random sample of 1000 images from COCO (Lin et al., 2014). |
| Dataset Splits | Yes | To ensure uniformity in our experiments, we uniformly subsampled each video (train or test and for all datasets) into 32 frames. Furthermore, we split each identity s authentic videos into a 50/50 train-test split. |
| Hardware Specification | Yes | For the audio-visual verification component, processing a whole video clip takes approximately 3 seconds on a single NVIDIA RTX 2080 GPU. |
| Software Dependencies | Yes | Specifically, for CLIP s architecture, we leveraged the Vi T-B/16 pretrained on the LAION-2B dataset, following OPENCLIP specifications (Ilharco et al., 2021). Furthermore, the checkpoint version of Stable Diffusion we used was v1-5 (Stable Diffusion.). |
| Experiment Setup | Yes | To ensure uniformity in our experiments, we uniformly subsampled each video (train or test and for all datasets) into 32 frames. Furthermore, we split each identity s authentic videos into a 50/50 train-test split. We opt for a simple but effective solution, using the truth score with the λ% lowest value in the video (we choose λ = 3%). Specifically, for CLIP s architecture, we leveraged the Vi T-B/16 pretrained on the LAION-2B dataset, following OPENCLIP specifications (Ilharco et al., 2021). |