Region-Based Optimization in Continual Learning for Audio Deepfake Detection
Authors: Yujie Chen, Jiangyan Yi, Cunhang Fan, Jianhua Tao, Yong Ren, Siding Zeng, Chu Yuan Zhang, Xinrui Yan, Hao Gu, Jun Xue, Chenglong Wang, Zhao Lv, Xiaohui Zhang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show our method achieves a 21.3 percent improvement in EER over the stateof-the-art continual learning approach RWM for audio deepfake detection. Moreover, the effectiveness of Reg O extends beyond the audio deepfake detection domain, showing potential significance in other tasks, such as image recognition. We conduct extensive experiments on the EVDA benchmark to validate the effectiveness of our method. Furthermore, we perform general study, and the results indicate that our approach holds potential significance in other domains, such as image recognition, without being limited to a specific field. We conduct a series of experiments to evaluate the effectiveness of our approach. |
| Researcher Affiliation | Academia | 1 School of Computer Science and Technology, Anhui University, China 2 Institute of Automation, Chinese Academy of Sciences 3 Department of Automation, Tsinghua University 4 Beijing National Research Center for lnformation Science and Technology, Tsinghua University EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Region-Based Optimization Require: Training data from different datasets, η (learning rate), R (Region matrix set) 1: for every dataset k do 2: for every batch b do 3: if k = 1 then 4: Update θk: θk θk ηw 5: else 6: Compute memory matrix Mk by Equ.(13) 7: Compute Ebbinghaus matrix Ek by Equ.(14) 8: Compute Rk by combining region matrix set R 9: g A g I{Rk[i][j]=0} 10: gp g ˆg ˆg 2 ˆg 11: g B gp I{Rk[i][j]=1} 12: go g gp 13: g C go I{Rk[i][j]=2} Pu l=1 N l Pu+v l=1 N l 15: eg β gp + (1 β) go 16: g D eg I{Rk[i][j]=3} 17: Initialization: w 0 18: w g A + g B + g C + g D 19: Update θk: θk θk ηw 20: end if 21: end for 22: Compute the k-th Region Matrix Rk by Equ.(1)(2)(3) 23: R Rk 24: end for |
| Open Source Code | Yes | The code is available at https://github.com/cyjie429/Reg O. |
| Open Datasets | Yes | The experiments are performed on a continual learning benchmark EVDA (Zhang, Yi, and Tao 2024) for speech deepfake detection, which includes eight publicly available and popular datasets specifically designed for incremental synthesis algorithm audio deepfake detection. Additionally, we carry out a general study in the field of image recognition using the well-established continual learning benchmark, CLEAR (Lin et al. 2021). The EVDA benchmark from Exp1 to Exp8 are FMFCC (Zhang et al. 2021), In the Wild (M uller et al. 2022), ADD 2022 (Yi et al. 2022), ASVspoof2015 (Wu et al. 2017), ASVspoof2019 (Todisco et al. 2019), ASVspoof2021 (Yamagishi et al. 2021), Fo R (Reimao and Tzerpos 2019), and HAD (Yi et al. 2021). |
| Dataset Splits | Yes | For the EVDA baseline, 2000 samples are randomly sampled from each dataset as the training set, and 5000 samples are sampled as the test set. |
| Hardware Specification | Yes | on an NVIDIA A100 80GB GPU. |
| Software Dependencies | No | We use the Adam optimizer to finetune the Simple Mlp, with a learning rate η of 0.0001 and a batch size of 32, on an NVIDIA A100 80GB GPU. |
| Experiment Setup | Yes | We use the Adam optimizer to finetune the Simple Mlp, with a learning rate η of 0.0001 and a batch size of 32, on an NVIDIA A100 80GB GPU. ... For the image recognition model, we use a pre-trained Res Net-50 (He et al. 2016) as the feature extractor, which is frozen during continual learning, generating 2048-dimensional features. The downstream classifier has three linear layers: 2048 to 1024, 1024 to 512, and 512 to 2. We set the initial learning rate to 0.1, a batch size of 512, and used SGD optimizer with 0.9 momentum. |