reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior

Authors: Zhongweiyang Xu, Xulin Fan, Zhong-Qiu Wang, Xilin Jiang, Romit Roy Choudhury

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluation results show that Array DPS outperforms all baseline unsupervised methods while being comparable to supervised methods in terms of SDR. Audio demos and codes are provided at: https://arraydps.github.io/Array DPSDemo/ and https://github.com/Array DPS/Array DPS. ... Extensive evaluation shows that Array DPS can achieve similar performance against recent supervised methods evaluated on ad-hoc microphone arrays, and performs the best among all unsupervised blind speech separation algorithms. ... 4. Experiments and Evaluation
Researcher Affiliation	Academia	1Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Champaign, USA 2Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China 3Columbia University, NYC, USA.
Pseudocode	Yes	Algorithm 1 Array DPS Require: {N, σi {0,...,N}, γi {0,...,N 1}, Snoise} ... Algorithm 2 Posterior Score Approximation Require: {Dθ, σi {0,...,N}, Nref, Nfg, ξ1(τ), ξ2(τ), λ}
Open Source Code	Yes	Audio demos and codes are provided at: https://arraydps.github.io/Array DPSDemo/ and https://github.com/Array DPS/Array DPS. ... We have open sourced Array DPS in https://github.com/Array DPS/Array DPS.
Open Datasets	Yes	We train this unconditional speech diffusion model on a clean subset of speech corpus Libri TTS (Zen et al., 2019). ... For evaluation, we use SMS-WSJ (Drude et al., 2019b) dataset for fixed microphone array evaluation and use Spatialized WSJ0-2Mix dataset (Wang et al., 2018) for ad-hoc microphone array evaluation.
Dataset Splits	Yes	The dataset consists 33,561 ( 87.4 h), 982 ( 2.5 h), and 1,332 ( 3.4 h) train, validation, and test mixtures, respectively, all in 8k Hz sampling rate. ... In general, the Spatialized WSJ0-2Mix dataset contains 20,000( 30h), 5,000 ( 10h), and 3,000 ( 5h) utterances in training, validation, and testing, respectively.
Hardware Specification	Yes	These models are all trained on a single A100 GPU and converges in about 5-6 days.
Software Dependencies	Yes	For the diffusion denoising architecture, we use the waveform domain U-Net as MSDM (Mariani et al., 2024), implemented in audio-diffusion-pytorch/v0.0.4322. ... We use the open-source torchiva toolkit (Scheibler & Saijo, 2022)
Experiment Setup	Yes	For the default configuration as in row 2a (Array DPS-A) in Table 1, we set N = 400 and Snoise = 1 as in Algorithm 1, σ0 = τmax = 0.8, σN = τmin = 1e 6, ρ = 10 as in Eq. 55, Smin = 0, Smax = 50, and Schurn = 30 as in Eq 56, ξ = 2, Nref = 200, Nfg = 100, and λ = 1.3 as in Algorithm 2. ... We train on speech samples with 65,536 samples ( 8.2 s) with batsh size 16 and learning rate 0.0001. The learning rate is multiplied by 0.8 every 60,000 training steps. ... We train the model for 840,000 training steps.