Learning Neural Vocoder from Range-Null Space Decomposition
Authors: Andong Li, Tong Lei, Zhihang Sun, Rilin Chen, Erwei Yin, Xiaodong Li, Chengshi Zheng
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments are conducted on the LJSpeech and Libri TTS benchmarks. Quantitative and qualitative results show that while enjoying lightweight network parameters, the proposed approach yields state-of-the-art performance among existing advanced methods. |
| Researcher Affiliation | Collaboration | 1Institute of Acoustics, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3Tencent AI Lab 4Nanjing University 5 Defense Innovation Institute, Academy of Military Sciences (AMS) 6 Tianjin Artificial Intelligence Innovation Center (TAIIC) EMAIL |
| Pseudocode | No | The paper describes network architectures and processes in detail (e.g., in Section 3.3 and Figure 3) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and the pretrained model weights are available at https://github.com/Andong-Li-speech/RNDVoC. |
| Open Datasets | Yes | Two benchmarks are employed in this study, namely LJSpeech [Keith and Linda, 2017] and Libri TTS [Zen et al., 2019]. |
| Dataset Splits | Yes | The LJSpeech dataset includes 13,100 clean speech clips by a single female, and the sampling rate is 22.05 k Hz. Following the division in the open-sourced VITS repository4, {12500, 100, 500} clips are used for training, valiation, and testing, respectively. The Libri TTS dataset covers diverse recording environments with the sampling rate of 24 k Hz. Following the division in [Lee et al., 2023], {train-clean-100, train-clean-300, train-other-500} are for model training. The subsets dev-clean + dev-other are for objective comparisons, and test-clean + test-other are for subjective evaluations. |
| Hardware Specification | Yes | The inference speed on a CPU is evaluated based on a CPU Intel(R) Core(TM) i7-14700F. For GPU, it is based on NVIDIA Ge Force RTX 4060 Ti. |
| Software Dependencies | No | The paper mentions the use of the Adam W optimizer but does not specify version numbers for any key software components or libraries like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | A batch size of 16, a segment size of 16384, and an initial learning rate of 2e-4 are used for training. The Adam W optimizer [Loshchilov and Hutter, 2017] is employed, with {β1 = 0.8, β2 = 0.99}. The generator and discriminator are updated for 1 million steps, respectively. |