DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory
Authors: Yutong Wang, Jiali Zeng, Xuebo Liu, Derek Wong, Fandong Meng, Jie Zhou, Min Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results indicate that DELTA significantly outperforms strong baselines in terms of translation consistency and quality across four open/closed-source LLMs and two representative document translation datasets, achieving an increase in consistency scores by up to 4.58 percentage points and in COMET scores by up to 3.16 points on average. |
| Researcher Affiliation | Collaboration | 1Institute of Computing and Intelligence, Harbin Institute of Technology, Shenzhen, China 2Pattern Recognition Center, We Chat AI, Tencent Inc, China 3NLP2CT Lab, Department of Computer and Information Science, University of Macau |
| Pseudocode | Yes | The main framework of DELTA is illustrated in Figure 1, the algorithm of DELTA is detailed in Algorithm 1, and the prompts used for each module are given in Appendix C. |
| Open Source Code | Yes | The code and data of our approach are released at https://github.com/Yutong Wang1216/Doc MTAgent. |
| Open Datasets | Yes | We conduct our experiments on the two test sets. The first is the tst2017 test sets from the IWSLT2017 translation task (Akiba et al., 2004), which consists of parallel documents sourced from TED talks, covering 12 language pairs. ... The second is Guofeng Webnovel (Wang et al., 2023c; 2024b), a high-quality and discourse-level corpus of web fiction. |
| Dataset Splits | Yes | We conduct our experiments on the two test sets. The first is the tst2017 test sets from the IWSLT2017 translation task... The second is Guofeng Webnovel... We conduct our experiments on the Guofeng V1 TEST 2 set in the Zh En direction. |
| Hardware Specification | Yes | As shown in Figure 3, we compared the memory usage by utilizing Qwen2-72B-Instruction to translate a document in En Zh on a device with 2 NVIDIA A800 80GB GPUs. |
| Software Dependencies | Yes | In this work, we utilize two versions of GPT models, GPT-3.5-Turbo-0125 and GPT-4o-mini, as our base models. ... We also introduce the open-source Qwen2-7B-Instruct and Qwen2-72B-Instruct in our experiments. ... We utilize two neural metrics to assess the quality of document translation. The first is the sentence-level COMET (s COMET) score, for which we utilize the model Unbabel/wmt22-comet-da to obtain the scores. The second metric is the document-level COMET (d COMET) score proposed by Vernikos et al. (2022), for which we use wmt21-comet-qe-mqm to derive reference-free scores. |
| Experiment Setup | Yes | The max new tokens is set to 2048 and other hyper-parameters remain default. The updating window of Bilingual summary m and length of Long-Term Memory l are set to 20. The number of retrieved relative sentences from Long-Term Memory n is set to 2. The length of Short-Term Memory k is set to 3. |