Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation
Authors: Shoutao Guo, Shaolei Zhang, Zhengrui Ma, Yang Feng
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on simultaneous translation and streaming automatic speech recognition tasks show that our method can achieve state-of-the-art performance utilizing the open-source LLMs and demonstrate practicality in real-world scenarios. |
| Researcher Affiliation | Academia | 1Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS) 2 Key Laboratory of AI Safety, Chinese Academy of Sciences 3 University of Chinese Academy of Sciences, Beijing, China |
| Pseudocode | No | The paper describes the method using text and mathematical equations, but does not include a dedicated pseudocode block or algorithm section. |
| Open Source Code | Yes | Code https://github.com/ictnlp/LSG |
| Open Datasets | Yes | WMT151 German English (De En) We conduct Simul T2TT task on this dataset. Consistent with Ma et al. (2020b), we use the newstest2015 set as the test set. Mu ST-C English German (En De) This dataset (Di Gangi et al. 2019) is collected from TED talks and we conduct the Simul T2TT task using its text data. Co Vo ST2 French English (Fr En) We use this dataset (Wang, Wu, and Pino 2020) to conduct both Simul S2TT and streaming ASR tasks. |
| Dataset Splits | Yes | WMT151 German English (De En) We conduct Simul T2TT task on this dataset. Consistent with Ma et al. (2020b), we use the newstest2015 set as the test set. |
| Hardware Specification | Yes | Additionally, for the Simul S2TT task, we evaluate computation-aware latency on an NVIDIA RTX 3090 GPU, which assesses the latency of the systems in practical applications. |
| Software Dependencies | No | The paper mentions several LLMs and methods like Llama2-7B-chat, LoRA, Qwen-Audio, Wav2Vec2-large, Whisper-base, and Sacre BLEU, but does not provide specific version numbers for underlying software or libraries like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We set δ = 9.0 and α = 0.6 for De En task, δ = 7.5 and α = 0.6 for En De task, and δ = 7.0 and α = 0.5 for Fr En task. For different latency scenarios, we set [L, U] as [1, 4], [3, 4], [5, 6], and [7, 6], respectively. |