reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adaptive Few-shot Prompting for Machine Translation with Pre-trained Language Models

Authors: Lei Tang, Jinghui Qin, Wenxuan Ye, Hao Tan, Zhijing Yang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, extensive experiments on the proposed diplomatic Chinese English parallel dataset and the United Nations Parallel Corpus (Chinese-English part) show the effectiveness and superiority of our proposed AFSP.
Researcher Affiliation	Academia	Lei Tang1, Jinghui Qin1*, Wenxuan Ye2, Hao Tan1, Zhijing Yang1 1Guangdong University of Technology 2The Chinese University of Hong Kong EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the Adaptive Few-shot Prompting (AFSP) framework and its components (translation demonstration retrieval module, rerank module) in detail, including mathematical formulations for embedding calculation and relevance scores. However, it does not present this information in a formally structured pseudocode or algorithm block.
Open Source Code	No	The paper does not explicitly state that source code for the described methodology will be released, nor does it provide a link to a code repository or supplementary materials containing code.
Open Datasets	Yes	To validate the effectiveness of the proposed AFSP, we first crawled a high-quality parallel Chinese English dataset named Diplomatic corpus from the China Diplomatic website3. The Diplomatic corpus consists of speeches made by spokespersons during routine press conferences, including questions posed by journalists and responses from Chinese spokespersons on a range of diplomatic issues. ... The first is accessibility. All data is publicly available on the China Diplomatic website and can be easily found online. ... Besides, we also use a Chinese-English subset from the UN Open Corpus v1.0 as the second testbed, which can show the universality of the proposed AFSP. The UN Parallel Corpus is a parallel corpus that includes official UN documents and statements from meetings.
Dataset Splits	Yes	For both two datasets Diplomatic and UN, we randomly selected 500 parallel translation pairs to serve as the test set for evaluating AFSP. The remaining pairs are used as the demonstration corpus for adaptive demonstration retrieval.
Hardware Specification	Yes	The k for few-shot prompts is set to 3 due to the limited memory of NVIDIA RTX 3090.
Software Dependencies	Yes	We deploy BERT (Devlin et al. 2019) as the backbone of the SLM in the Re-ranker. For Chinese-English translation, we use Bert-large-cased1 while we use Bert-based-Chinese2 as the SLM for English-Chinese translation.
Experiment Setup	Yes	The α1, α2, and α3 are set to 0.4, 0.4, 0.2 for computing the final relevance score srank. The k for few-shot prompts is set to 3 due to the limited memory of NVIDIA RTX 3090. For the closed-source Chat GPT-3.5-turbo-0125, we deploy Chat GLM3-6B as the embedding model for hybrid demonstration retrieval. We conduct top-30 sampling for Chat GLM3-6B, Intern LM2-7B, and Llama3-8B and top-5 sampling for Chat GPT-3.5-turbo-0125.