Uncertainty-Aware Self-Training for CTC-Based Automatic Speech Recognition

Authors: Eungbeom Kim, Kyogu Lee

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we present uncertainty-aware self-training for CTC-based ASR models and experimentally show the effectiveness of the proposed method compared to the baselines.
Researcher Affiliation Academia 1Interdisciplinary Program in Artificial Intelligence, Seoul National University 2Artificial Intelligence Institute, Seoul National University 3Department of Intelligence and Information, Seoul National University EMAIL
Pseudocode No The paper includes mathematical equations describing the model and loss functions (e.g., Equation 1-13) but does not contain explicitly labeled pseudocode blocks or algorithms.
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets Yes For semi-supervised learning, the Libri Speech dataset (Panayotov et al. 2015) is considered to follow the previous ASR semi-supervised learning methods (Kahn, Lee, and Hannun 2020; Park et al. 2020; Xu et al. 2021; Kim et al. 2023; Li, Meng, and Sun 2023; Higuchi et al. 2023).
Dataset Splits Yes For the labeled training dataset, the 100 hours Libri Speech train-clean dataset (LS-100) is utilized. For the unlabeled training dataset, we consider two datasets: the 360 hours Libri Speech train-clean dataset (LS-360) and the 500 hours Libri Speech train-other dataset (LS-500). Tables 1 and 2 also present results for 'dev-clean', 'dev-other', 'test-clean', 'test-other' which are standard splits of the Libri Speech dataset.
Hardware Specification No The paper states that experiments were conducted but does not specify any hardware details such as GPU models, CPU types, or memory.
Software Dependencies No The paper mentions using the 'Adam optimizer (Kingma and Ba 2015)' but does not specify version numbers for Adam or any other software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used for implementation.
Experiment Setup Yes We use a learning rate of 3e-5 with the Adam optimizer (Kingma and Ba 2015) with 10% warmup stages out of 100 total epochs. Also, the model is frozen for 12.5% of the training except for a newly initialized linear CTC layer. A batch size of 128 and the CTC loss function is utilized for optimization. We set N = 3 for the number of Dropout implementations. We experiment on the hyperparameter α {0.1, 0.2, 0.3} for Equation 11, 12. We clip the uncertainty value upl lower than 1% of the training dataset to stabilize the loss attenuation in Equation 10 and set λ as the clipped criterion.