reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Impact of Syntactic and Semantic Proximity on Machine Translation with Back-Translation

Authors: Nicolas Guerin, Emmanuel Chemla, Shane Steinert-Threlkeld

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We therefore conduct controlled experiments with artificial languages to determine what properties of languages make back-translation an effective training method, covering lexical, syntactic, and semantic properties. We find, contrary to popular belief, that (i) parallel word frequency distributions, (ii) partially shared vocabulary, and (iii) similar syntactic structure across languages are not sufficient to explain the success of back-translation. We show however that even crude semantic signal (similar lexical fields across languages) does improve alignment of two languages through backtranslation.
Researcher Affiliation	Academia	Nicolas Guerin EMAIL Laboratoire de Sciences Cognitives et Psycholinguistique École Normale Supérieure PSL University Emmanuel Chemla EMAIL Laboratoire de Sciences Cognitives et Psycholinguistique École Normale Supérieure PSL University Shane Steinert-Threlkeld EMAIL Department of Linguistics University of Washington
Pseudocode	No	The paper describes the model components and objectives mathematically and visually in Figure 1 and Figure 2, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the methodology described, nor does it provide a direct link to a code repository. It mentions using 'the exact model introduced by Lample et al. (2018c)' but this refers to a third-party model, not their own implementation's code.
Open Datasets	No	For training purposes, we generated two sets of 100,000 sentence structures2. All unsupervised training sets are made of (i) one of these sets of sentence structures for the 100,000 sentences in one language (using the language specific grammar and lexicon), and (ii) the other set to create 100,000 sentences in the other language. Hence, the data are neither labeled for supervision, nor are they even unlabelled translations from one another in principle. The test and validation sets are each composed of 10,000 parallel sentence pairs. These are generated in one language, and then transformed into the second language using the known grammar switches and lexical translations. The paper describes the generation of artificial languages and data sets for the experiments but does not provide specific access information (link, DOI, etc.) to these generated datasets.
Dataset Splits	Yes	For training purposes, we generated two sets of 100,000 sentence structures2. All unsupervised training sets are made of (i) one of these sets of sentence structures for the 100,000 sentences in one language (using the language specific grammar and lexicon), and (ii) the other set to create 100,000 sentences in the other language. Hence, the data are neither labeled for supervision, nor are they even unlabelled translations from one another in principle. The test and validation sets are each composed of 10,000 parallel sentence pairs. These are generated in one language, and then transformed into the second language using the known grammar switches and lexical translations.
Hardware Specification	Yes	We trained on a Tesla M10 with 8GB memory for roughly one day.
Software Dependencies	No	The paper mentions using 'the exact model introduced by Lample et al. (2018c) for NMT', 'Transformer encoder layers and 4 Transformer decoder layers', and 'Fast Text algorithm'. However, it does not specify version numbers for any of these software components or libraries.
Experiment Setup	Yes	We then train the translation model for 40 epochs, with a batch size of 16 and an Adam optimizer with a 10 4 learning rate. Those hyper-parameters were chosen based on Lample et al. (2018c) and on our computational capabilities. We trained on a Tesla M10 with 8GB memory for roughly one day.