reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Neural Vocoder from Range-Null Space Decomposition

Authors: Andong Li, Tong Lei, Zhihang Sun, Rilin Chen, Erwei Yin, Xiaodong Li, Chengshi Zheng

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments are conducted on the LJSpeech and Libri TTS benchmarks. Quantitative and qualitative results show that while enjoying lightweight network parameters, the proposed approach yields state-of-the-art performance among existing advanced methods.
Researcher Affiliation	Collaboration	1Institute of Acoustics, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3Tencent AI Lab 4Nanjing University 5 Defense Innovation Institute, Academy of Military Sciences (AMS) 6 Tianjin Artificial Intelligence Innovation Center (TAIIC) EMAIL
Pseudocode	No	The paper describes network architectures and processes in detail (e.g., in Section 3.3 and Figure 3) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code and the pretrained model weights are available at https://github.com/Andong-Li-speech/RNDVoC.
Open Datasets	Yes	Two benchmarks are employed in this study, namely LJSpeech [Keith and Linda, 2017] and Libri TTS [Zen et al., 2019].
Dataset Splits	Yes	The LJSpeech dataset includes 13,100 clean speech clips by a single female, and the sampling rate is 22.05 k Hz. Following the division in the open-sourced VITS repository4, {12500, 100, 500} clips are used for training, valiation, and testing, respectively. The Libri TTS dataset covers diverse recording environments with the sampling rate of 24 k Hz. Following the division in [Lee et al., 2023], {train-clean-100, train-clean-300, train-other-500} are for model training. The subsets dev-clean + dev-other are for objective comparisons, and test-clean + test-other are for subjective evaluations.
Hardware Specification	Yes	The inference speed on a CPU is evaluated based on a CPU Intel(R) Core(TM) i7-14700F. For GPU, it is based on NVIDIA Ge Force RTX 4060 Ti.
Software Dependencies	No	The paper mentions the use of the Adam W optimizer but does not specify version numbers for any key software components or libraries like Python, PyTorch, or CUDA.
Experiment Setup	Yes	A batch size of 16, a segment size of 16384, and an initial learning rate of 2e-4 are used for training. The Adam W optimizer [Loshchilov and Hutter, 2017] is employed, with {β1 = 0.8, β2 = 0.99}. The generator and discriminator are updated for 1 million steps, respectively.