reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An Association-based Fusion Method for Speech Enhancement

Authors: Shijie Wang, Qian Guo, Lu Chen, Liang Du, Zikun Jin, Zhian Yuan, Xinyan Liang

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that the AFSE method significantly improves performance in speech enhancement tasks, validating the effectiveness and superiority of our approach.
Researcher Affiliation	Academia	1Institute of Big Data Science and Industry, Key Laboratory of Evolutionary Science Intelligence of Shanxi Province, Shanxi University, Taiyuan 030006, China 2 Shanxi Key Laboratory of Big Data Analysis and Parallel Computing, School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 AFSE Training Procedure Input: the number of samples M, total epochs N, loss functions Q, hops K, activation function σ, concatenation operation Cat, mixing coefficient α Parameter: W: a trainable weight matrix , U: Unet, Enc: Encoder, Dec: Decoder 1: for epoch = 1 to Total epochs N do 2: for m = 1 to M do 3: Sample a pair of noisy and clean speeches (X, Y ) 4: H0 = Enc(X) 5: The adjacency matrix A is obtained by Eq. 7 6: A = D 1 2 7: Hc = H0 8: for k = 1 to K do 9: H(k) = (1 α) AH(k 1) + αH(k 1) 10: Hc = Cat(Hc, Hk) 11: end for 12: XG = Dec(σ(Hc W)) 13: ˆX = U(XG) 14: loss = Q( ˆX, Y ) 15: Update the learnable parameters W, U, Enc, Dec 16: end for 17: end for
Open Source Code	Yes	The code is available at https://github.com/jie019/ AFSE IJCAI2025.
Open Datasets	Yes	To evaluate the proposed AFSE, we use two datasets: Voice Bank-DEMAND (VBD) and DNS Challenge. The Voice Bank-DEMAND [Valentini-Botinhao et al., 2016]... The Interspeech 2020 DNS challenge dataset [Reddy et al., 2020]
Dataset Splits	Yes	The training/validation and test datasets contain 11,572 utterances with four signal-to-noise ratio (SNR) (15, 10, 5, and 0 d B) levels and 824 utterances with four SNR (17.5, 12.5, 7.5, and 2.5 d B) levels, respectively. ...Following [Zheng et al., 2021], we synthesize 500 hours noisy clips with SNR levels of -5 d B, 0 d B, 5 d B, 10 d B and 15 d B for training. For evaluation, we use another 150 noisy clips from the test set without reverberation.
Hardware Specification	Yes	The model is trained on Pytorch platform with a NVIDIA RTX 4090 GPU.
Software Dependencies	No	The model is trained on Pytorch platform. (No version number provided for Pytorch.)
Experiment Setup	Yes	We use the Adam optimizer with a batch size of 8 to train the proposed model, and the learning rate is initialized as 1e 3. Moreover, we train the model by 60 epochs for two datasets.