Uncertainty-Aware Self-Training for CTC-Based Automatic Speech Recognition
Authors: Eungbeom Kim, Kyogu Lee
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we present uncertainty-aware self-training for CTC-based ASR models and experimentally show the effectiveness of the proposed method compared to the baselines. |
| Researcher Affiliation | Academia | 1Interdisciplinary Program in Artificial Intelligence, Seoul National University 2Artificial Intelligence Institute, Seoul National University 3Department of Intelligence and Information, Seoul National University EMAIL |
| Pseudocode | No | The paper includes mathematical equations describing the model and loss functions (e.g., Equation 1-13) but does not contain explicitly labeled pseudocode blocks or algorithms. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or provide a link to a code repository. |
| Open Datasets | Yes | For semi-supervised learning, the Libri Speech dataset (Panayotov et al. 2015) is considered to follow the previous ASR semi-supervised learning methods (Kahn, Lee, and Hannun 2020; Park et al. 2020; Xu et al. 2021; Kim et al. 2023; Li, Meng, and Sun 2023; Higuchi et al. 2023). |
| Dataset Splits | Yes | For the labeled training dataset, the 100 hours Libri Speech train-clean dataset (LS-100) is utilized. For the unlabeled training dataset, we consider two datasets: the 360 hours Libri Speech train-clean dataset (LS-360) and the 500 hours Libri Speech train-other dataset (LS-500). Tables 1 and 2 also present results for 'dev-clean', 'dev-other', 'test-clean', 'test-other' which are standard splits of the Libri Speech dataset. |
| Hardware Specification | No | The paper states that experiments were conducted but does not specify any hardware details such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer (Kingma and Ba 2015)' but does not specify version numbers for Adam or any other software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used for implementation. |
| Experiment Setup | Yes | We use a learning rate of 3e-5 with the Adam optimizer (Kingma and Ba 2015) with 10% warmup stages out of 100 total epochs. Also, the model is frozen for 12.5% of the training except for a newly initialized linear CTC layer. A batch size of 128 and the CTC loss function is utilized for optimization. We set N = 3 for the number of Dropout implementations. We experiment on the hyperparameter α {0.1, 0.2, 0.3} for Equation 11, 12. We clip the uncertainty value upl lower than 1% of the training dataset to stabilize the loss attenuation in Equation 10 and set λ as the clipped criterion. |