An Association-based Fusion Method for Speech Enhancement
Authors: Shijie Wang, Qian Guo, Lu Chen, Liang Du, Zikun Jin, Zhian Yuan, Xinyan Liang
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that the AFSE method significantly improves performance in speech enhancement tasks, validating the effectiveness and superiority of our approach. |
| Researcher Affiliation | Academia | 1Institute of Big Data Science and Industry, Key Laboratory of Evolutionary Science Intelligence of Shanxi Province, Shanxi University, Taiyuan 030006, China 2 Shanxi Key Laboratory of Big Data Analysis and Parallel Computing, School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 AFSE Training Procedure Input: the number of samples M, total epochs N, loss functions Q, hops K, activation function σ, concatenation operation Cat, mixing coefficient α Parameter: W: a trainable weight matrix , U: Unet, Enc: Encoder, Dec: Decoder 1: for epoch = 1 to Total epochs N do 2: for m = 1 to M do 3: Sample a pair of noisy and clean speeches (X, Y ) 4: H0 = Enc(X) 5: The adjacency matrix A is obtained by Eq. 7 6: A = D 1 2 7: Hc = H0 8: for k = 1 to K do 9: H(k) = (1 α) AH(k 1) + αH(k 1) 10: Hc = Cat(Hc, Hk) 11: end for 12: XG = Dec(σ(Hc W)) 13: ˆX = U(XG) 14: loss = Q( ˆX, Y ) 15: Update the learnable parameters W, U, Enc, Dec 16: end for 17: end for |
| Open Source Code | Yes | The code is available at https://github.com/jie019/ AFSE IJCAI2025. |
| Open Datasets | Yes | To evaluate the proposed AFSE, we use two datasets: Voice Bank-DEMAND (VBD) and DNS Challenge. The Voice Bank-DEMAND [Valentini-Botinhao et al., 2016]... The Interspeech 2020 DNS challenge dataset [Reddy et al., 2020] |
| Dataset Splits | Yes | The training/validation and test datasets contain 11,572 utterances with four signal-to-noise ratio (SNR) (15, 10, 5, and 0 d B) levels and 824 utterances with four SNR (17.5, 12.5, 7.5, and 2.5 d B) levels, respectively. ...Following [Zheng et al., 2021], we synthesize 500 hours noisy clips with SNR levels of -5 d B, 0 d B, 5 d B, 10 d B and 15 d B for training. For evaluation, we use another 150 noisy clips from the test set without reverberation. |
| Hardware Specification | Yes | The model is trained on Pytorch platform with a NVIDIA RTX 4090 GPU. |
| Software Dependencies | No | The model is trained on Pytorch platform. (No version number provided for Pytorch.) |
| Experiment Setup | Yes | We use the Adam optimizer with a batch size of 8 to train the proposed model, and the learning rate is initialized as 1e 3. Moreover, we train the model by 60 epochs for two datasets. |