reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Region-Based Optimization in Continual Learning for Audio Deepfake Detection

Authors: Yujie Chen, Jiangyan Yi, Cunhang Fan, Jianhua Tao, Yong Ren, Siding Zeng, Chu Yuan Zhang, Xinrui Yan, Hao Gu, Jun Xue, Chenglong Wang, Zhao Lv, Xiaohui Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show our method achieves a 21.3 percent improvement in EER over the stateof-the-art continual learning approach RWM for audio deepfake detection. Moreover, the effectiveness of Reg O extends beyond the audio deepfake detection domain, showing potential significance in other tasks, such as image recognition. We conduct extensive experiments on the EVDA benchmark to validate the effectiveness of our method. Furthermore, we perform general study, and the results indicate that our approach holds potential significance in other domains, such as image recognition, without being limited to a specific field. We conduct a series of experiments to evaluate the effectiveness of our approach.
Researcher Affiliation	Academia	1 School of Computer Science and Technology, Anhui University, China 2 Institute of Automation, Chinese Academy of Sciences 3 Department of Automation, Tsinghua University 4 Beijing National Research Center for lnformation Science and Technology, Tsinghua University EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Region-Based Optimization Require: Training data from different datasets, η (learning rate), R (Region matrix set) 1: for every dataset k do 2: for every batch b do 3: if k = 1 then 4: Update θk: θk θk ηw 5: else 6: Compute memory matrix Mk by Equ.(13) 7: Compute Ebbinghaus matrix Ek by Equ.(14) 8: Compute Rk by combining region matrix set R 9: g A g I{Rk[i][j]=0} 10: gp g ˆg ˆg 2 ˆg 11: g B gp I{Rk[i][j]=1} 12: go g gp 13: g C go I{Rk[i][j]=2} Pu l=1 N l Pu+v l=1 N l 15: eg β gp + (1 β) go 16: g D eg I{Rk[i][j]=3} 17: Initialization: w 0 18: w g A + g B + g C + g D 19: Update θk: θk θk ηw 20: end if 21: end for 22: Compute the k-th Region Matrix Rk by Equ.(1)(2)(3) 23: R Rk 24: end for
Open Source Code	Yes	The code is available at https://github.com/cyjie429/Reg O.
Open Datasets	Yes	The experiments are performed on a continual learning benchmark EVDA (Zhang, Yi, and Tao 2024) for speech deepfake detection, which includes eight publicly available and popular datasets specifically designed for incremental synthesis algorithm audio deepfake detection. Additionally, we carry out a general study in the field of image recognition using the well-established continual learning benchmark, CLEAR (Lin et al. 2021). The EVDA benchmark from Exp1 to Exp8 are FMFCC (Zhang et al. 2021), In the Wild (M uller et al. 2022), ADD 2022 (Yi et al. 2022), ASVspoof2015 (Wu et al. 2017), ASVspoof2019 (Todisco et al. 2019), ASVspoof2021 (Yamagishi et al. 2021), Fo R (Reimao and Tzerpos 2019), and HAD (Yi et al. 2021).
Dataset Splits	Yes	For the EVDA baseline, 2000 samples are randomly sampled from each dataset as the training set, and 5000 samples are sampled as the test set.
Hardware Specification	Yes	on an NVIDIA A100 80GB GPU.
Software Dependencies	No	We use the Adam optimizer to finetune the Simple Mlp, with a learning rate η of 0.0001 and a batch size of 32, on an NVIDIA A100 80GB GPU.
Experiment Setup	Yes	We use the Adam optimizer to finetune the Simple Mlp, with a learning rate η of 0.0001 and a batch size of 32, on an NVIDIA A100 80GB GPU. ... For the image recognition model, we use a pre-trained Res Net-50 (He et al. 2016) as the feature extractor, which is frozen during continual learning, generating 2048-dimensional features. The downstream classifier has three linear layers: 2048 to 1024, 1024 to 512, and 512 to 2. We set the initial learning rate to 0.1, a batch size of 512, and used SGD optimizer with 0.9 momentum.