Code-switching Mediated Sentence-level Semantic Learning
Authors: Shuai Zhang, Jiangyan Yi, Zhengqi Wen, Jianhua Tao, Feihu Che, Jinyang Wu, Ruibo Fu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we conduct thorough experiments on speech recognition, speech translation, and language modeling tasks. The experimental results fully demonstrate that the proposed method can widely improve the performance of code-switching related tasks. |
| Researcher Affiliation | Academia | Shuai Zhang1, Jiangyan Yi1, Zhengqi Wen1, Jianhua Tao 1*, Feihu Che 1*, Jinyang Wu1, Ruibo Fu2 1Department of Automation & BNRist, Tsinghua University 2Institute of Automation, Chinese Academy of Sciences EMAIL, EMAIL |
| Pseudocode | No | The paper describes its methodology using textual descriptions, mathematical equations, and architectural diagrams (Figure 2), but does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing code, nor does it include links to a code repository or mention code in supplementary materials. |
| Open Datasets | Yes | We conduct our experiments on three popular publicly available datasets, including the ASRU 2019 Mandarin-English code-switching challenge dataset (Shi, Feng, and Xie 2020), Fisher dataset (Cieri, Miller, and Walker 2004) and TED English-Chinese dataset (Liu et al. 2019). |
| Dataset Splits | Yes | Statistical information on the code-switching dataset is shown in Table 1. ... The Fisher data consists of three evaluation sets (Dev/Dev2/Test) that together contain approximately a thousand instances of code-switching with corresponding translations in monolingual English. We therefore combined all the code-switching data from the three evaluation sets as a test set. |
| Hardware Specification | Yes | We use Adam optimizer with β1 = 0.9, β2 = 0.998, ϵ = 1e 8 on 4 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'transformer architecture' and 'Llama 3 70B' for data processing, but does not provide specific version numbers for these or other software libraries/frameworks used for the implementation of their models. |
| Experiment Setup | Yes | The attention dimensions of the encoder and decoder are both 512 and the number of the head is 4. The dimension of position-wise feed-forward networks is 1024. The number of acoustic encoder blocks and decoder blocks are 12 and 6 respectively. To avoid over-fitting, the unified label smoothing technique is used, and the parameter is set to 0.1. Spec Augment with frequency masking (F=30, m F=2) and time masking (T=40, m T=2) is used to improve the performance of the models (Park et al. 2019). Meanwhile, we set the residual dropout as 0.1, where the residual dropout is applied to each sub-block before adding the residual information. We use Adam optimizer with β1 = 0.9, β2 = 0.998, ϵ = 1e 8 on 4 NVIDIA A100 GPUs. The batch size is set to 128 during the training process. The learning rate is set by a warm-up strategy. We perform decoding using beam search with a beam size of 10. ... when α is set to 0.7 and β is set to 0.1, both ASR and AST tasks can achieve satisfactory results. All subsequent experiments use these parameter settings. |