reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation

Authors: Yuqiao Wen, Behzad Shayegh, Chenyang Huang, Yanshuai Cao, Lili Mou

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted experiments on IWSLT (Cettolo et al. 2017) and Europarl (Koehn 2005), two popular multilingual translation datasets for zero-shot machine translation. Results show that EBBS can generate high-quality translations and outperform existing ensemble techniques.
Researcher Affiliation	Collaboration	Yuqiao Wen1,*, Behzad Shayegh1, , Chenyang Huang1, Yanshuai Cao2, Lili Mou1,3 1Dept. Computing Science, Alberta Machine Intelligence Institute (Amii), University of Alberta 2RBC Borealis 3Canada CIFAR AI Chair, Amii EMAIL, EMAIL, EMAIL EMAIL, EMAIL
Pseudocode	Yes	We provide the detailed pseudocode for EBBS in Algorithm 1 and an illustration in Figure 1.
Open Source Code	Yes	Git Hub https://github.com/MANGA-UOFA/EBBS
Open Datasets	Yes	We evaluated EBBS on two popular benchmark datasets for zero-shot machine translation: IWSLT (Cettolo et al. 2017), which contains 4 languages (with English) and 6 zero-shot directions; and Europarl v7 (Koehn 2005), which contains 9 languages and 56 zero-shot directions.
Dataset Splits	No	The paper mentions using IWSLT and Europarl datasets and refers to replicating a previous model's training setup (Liu et al. 2021) and standard practice for selecting subsets for distillation (Fan et al. 2021). However, it does not explicitly state the train/validation/test split percentages or sample counts used in this paper.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	The paper mentions the use of a Transformer architecture and a byte pair encoding tokenizer, but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Specifically, the neural architecture in (Liu et al. 2021) is a 5-layer encoder decoder Transformer for IWSLT, but has 8 layers for Europarl to accommodate more training data and languages. For EBBS, we used a beam size of five for both upper- and lower-level beams. In our experiment, we implemented standard beam search for comparison, where we also used a beam size of five, following the common practice (Meister, Cotterell, and Vieira 2020).