An Association-based Fusion Method for Speech Enhancement

Authors: Shijie Wang, Qian Guo, Lu Chen, Liang Du, Zikun Jin, Zhian Yuan, Xinyan Liang

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that the AFSE method significantly improves performance in speech enhancement tasks, validating the effectiveness and superiority of our approach.
Researcher Affiliation Academia 1Institute of Big Data Science and Industry, Key Laboratory of Evolutionary Science Intelligence of Shanxi Province, Shanxi University, Taiyuan 030006, China 2 Shanxi Key Laboratory of Big Data Analysis and Parallel Computing, School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 AFSE Training Procedure Input: the number of samples M, total epochs N, loss functions Q, hops K, activation function σ, concatenation operation Cat, mixing coefficient α Parameter: W: a trainable weight matrix , U: Unet, Enc: Encoder, Dec: Decoder 1: for epoch = 1 to Total epochs N do 2: for m = 1 to M do 3: Sample a pair of noisy and clean speeches (X, Y ) 4: H0 = Enc(X) 5: The adjacency matrix A is obtained by Eq. 7 6: A = D 1 2 7: Hc = H0 8: for k = 1 to K do 9: H(k) = (1 α) AH(k 1) + αH(k 1) 10: Hc = Cat(Hc, Hk) 11: end for 12: XG = Dec(σ(Hc W)) 13: ˆX = U(XG) 14: loss = Q( ˆX, Y ) 15: Update the learnable parameters W, U, Enc, Dec 16: end for 17: end for
Open Source Code Yes The code is available at https://github.com/jie019/ AFSE IJCAI2025.
Open Datasets Yes To evaluate the proposed AFSE, we use two datasets: Voice Bank-DEMAND (VBD) and DNS Challenge. The Voice Bank-DEMAND [Valentini-Botinhao et al., 2016]... The Interspeech 2020 DNS challenge dataset [Reddy et al., 2020]
Dataset Splits Yes The training/validation and test datasets contain 11,572 utterances with four signal-to-noise ratio (SNR) (15, 10, 5, and 0 d B) levels and 824 utterances with four SNR (17.5, 12.5, 7.5, and 2.5 d B) levels, respectively. ...Following [Zheng et al., 2021], we synthesize 500 hours noisy clips with SNR levels of -5 d B, 0 d B, 5 d B, 10 d B and 15 d B for training. For evaluation, we use another 150 noisy clips from the test set without reverberation.
Hardware Specification Yes The model is trained on Pytorch platform with a NVIDIA RTX 4090 GPU.
Software Dependencies No The model is trained on Pytorch platform. (No version number provided for Pytorch.)
Experiment Setup Yes We use the Adam optimizer with a batch size of 8 to train the proposed model, and the learning rate is initialized as 1e 3. Moreover, we train the model by 60 epochs for two datasets.