reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Uncertainty-Aware Self-Training for CTC-Based Automatic Speech Recognition

Authors: Eungbeom Kim, Kyogu Lee

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we present uncertainty-aware self-training for CTC-based ASR models and experimentally show the effectiveness of the proposed method compared to the baselines.
Researcher Affiliation	Academia	1Interdisciplinary Program in Artificial Intelligence, Seoul National University 2Artificial Intelligence Institute, Seoul National University 3Department of Intelligence and Information, Seoul National University EMAIL
Pseudocode	No	The paper includes mathematical equations describing the model and loss functions (e.g., Equation 1-13) but does not contain explicitly labeled pseudocode blocks or algorithms.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets	Yes	For semi-supervised learning, the Libri Speech dataset (Panayotov et al. 2015) is considered to follow the previous ASR semi-supervised learning methods (Kahn, Lee, and Hannun 2020; Park et al. 2020; Xu et al. 2021; Kim et al. 2023; Li, Meng, and Sun 2023; Higuchi et al. 2023).
Dataset Splits	Yes	For the labeled training dataset, the 100 hours Libri Speech train-clean dataset (LS-100) is utilized. For the unlabeled training dataset, we consider two datasets: the 360 hours Libri Speech train-clean dataset (LS-360) and the 500 hours Libri Speech train-other dataset (LS-500). Tables 1 and 2 also present results for 'dev-clean', 'dev-other', 'test-clean', 'test-other' which are standard splits of the Libri Speech dataset.
Hardware Specification	No	The paper states that experiments were conducted but does not specify any hardware details such as GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions using the 'Adam optimizer (Kingma and Ba 2015)' but does not specify version numbers for Adam or any other software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used for implementation.
Experiment Setup	Yes	We use a learning rate of 3e-5 with the Adam optimizer (Kingma and Ba 2015) with 10% warmup stages out of 100 total epochs. Also, the model is frozen for 12.5% of the training except for a newly initialized linear CTC layer. A batch size of 128 and the CTC loss function is utilized for optimization. We set N = 3 for the number of Dropout implementations. We experiment on the hyperparameter α {0.1, 0.2, 0.3} for Equation 11, 12. We clip the uncertainty value upl lower than 1% of the training dataset to stabilize the loss attenuation in Equation 10 and set λ as the clipped criterion.