Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation

Authors: Shoutao Guo, Shaolei Zhang, Zhengrui Ma, Yang Feng

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on simultaneous translation and streaming automatic speech recognition tasks show that our method can achieve state-of-the-art performance utilizing the open-source LLMs and demonstrate practicality in real-world scenarios.
Researcher Affiliation Academia 1Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS) 2 Key Laboratory of AI Safety, Chinese Academy of Sciences 3 University of Chinese Academy of Sciences, Beijing, China
Pseudocode No The paper describes the method using text and mathematical equations, but does not include a dedicated pseudocode block or algorithm section.
Open Source Code Yes Code https://github.com/ictnlp/LSG
Open Datasets Yes WMT151 German English (De En) We conduct Simul T2TT task on this dataset. Consistent with Ma et al. (2020b), we use the newstest2015 set as the test set. Mu ST-C English German (En De) This dataset (Di Gangi et al. 2019) is collected from TED talks and we conduct the Simul T2TT task using its text data. Co Vo ST2 French English (Fr En) We use this dataset (Wang, Wu, and Pino 2020) to conduct both Simul S2TT and streaming ASR tasks.
Dataset Splits Yes WMT151 German English (De En) We conduct Simul T2TT task on this dataset. Consistent with Ma et al. (2020b), we use the newstest2015 set as the test set.
Hardware Specification Yes Additionally, for the Simul S2TT task, we evaluate computation-aware latency on an NVIDIA RTX 3090 GPU, which assesses the latency of the systems in practical applications.
Software Dependencies No The paper mentions several LLMs and methods like Llama2-7B-chat, LoRA, Qwen-Audio, Wav2Vec2-large, Whisper-base, and Sacre BLEU, but does not provide specific version numbers for underlying software or libraries like Python, PyTorch, or CUDA.
Experiment Setup Yes We set δ = 9.0 and α = 0.6 for De En task, δ = 7.5 and α = 0.6 for En De task, and δ = 7.0 and α = 0.5 for Fr En task. For different latency scenarios, we set [L, U] as [1, 4], [3, 4], [5, 6], and [7, 6], respectively.