reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Complex-Cycle-Consistent Diffusion Model for Monaural Speech Enhancement

Authors: Yi Li, Yang Sun, Plamen P Angelov

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on public datasets to demonstrate the effectiveness of our method, highlighting the significant benefits of exploiting the intrinsic relationship between phase and magnitude information to enhance speech. The comparison to conventional diffusion models demonstrates the superiority of SEDM. Experiments Datasets We extensively perform experiments on several public speech datasets, including IEEE (IEEE Audio and Electroacoustics Group 1969), TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT) (Garofolo et al. 1993), VOICE BANK (VCTK) (Veaux, Yamagishi, and King 2013), and Deep Noise Suppression (DNS) challenge (Reddy et al. 2021).
Researcher Affiliation	Academia	Yi Li1, Yang Sun2, Plamen P Angelov1 1School of Computing and Communications, Lancaster University, UK 2Big Data Institute, University of Oxford, UK
Pseudocode	Yes	The pseudocode of the proposed CCC module is summarized as Algorithm 1. Algorithm 1: Proposed complex-cycle-consistent learning
Open Source Code	No	The paper does not contain any explicit statements about providing source code, nor does it include links to a code repository or mention code in supplementary materials.
Open Datasets	Yes	We extensively perform experiments on several public speech datasets, including IEEE (IEEE Audio and Electroacoustics Group 1969), TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT) (Garofolo et al. 1993), VOICE BANK (VCTK) (Veaux, Yamagishi, and King 2013), and Deep Noise Suppression (DNS) challenge (Reddy et al. 2021). To generate noisy speech signals in training and test, we randomly collect and use 10 of 15 noise types ... from Diverse Environments Multichannel Acoustic Noise Database (DEMAND) (Thiemann, Ito, and Vincent 2013).
Dataset Splits	Yes	Evaluations on the IEEE and TIMIT Datasets The first experiment is conducted on IEEE and TIMIT (IEEE Audio and Electroacoustics Group 1969; Garofolo et al. 1993). In the training and development stages, 600 recordings from 60 speakers and 60 recordings from 6 speakers are randomly selected in each dataset, respectively. ... We randomly generate 11572 noisy mixtures with 10 background noises at one of 4 SNR levels (15, 10, 5, and 0 d B) in the training stage. The test set with 2 speakers, unseen during training, consists of a total of 20 different noise conditions: 5 types of noise sourced from the DEMAND dataset at one of 4 SNRs each (17.5, 12.5, 7.5, and 2.5 d B). This yields 824 test items, with approximately 20 different sentences in each condition per test speaker. ... In the training stage, 75% of the clean speeches are mixed with the background noise but without reverberation at a random SNR in between -5 and 20 d B as (Hao et al. 2021). In the test stage, 150 noisy clips are randomly selected from the blind test dataset without reverberations.
Hardware Specification	Yes	All the experiments are run on Tesla V100 GPUs.
Software Dependencies	No	The paper mentions 'Adam optimizer' but does not specify its version, nor does it list other software dependencies (e.g., programming languages, libraries, or frameworks) with version numbers.
Experiment Setup	Yes	Model Configuration We set the number of diffusion blocks and channels as [N,C] [30,63],[40,128],[50,128] for small, medium, and large SEDM models (SEDM-S, SEDM-M, SEDM-L), respectively. The number of reverse blocks is equal to the number of diffusion blocks, i.e., M = N. The kernel size of Bi-Dil Conv is 3, and the dilation is doubled at each layer within each block as [1, 2, 4, ..., 2n 1]. Each LSTM in CCC consists of three hidden layers and 30 features in the hidden state. ... The proposed model is trained by using the Adam optimizer with a weight decay of 0.0001, a momentum of 0.9, and a batch size of 64. We train the networks for 200 epochs, where we warm-up the network in the first 20 epochs by without CCC losses. The initial learning rate is 0.03, and is multiplied by 0.1 at 120 and 160 epochs.