reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bilingual Distributed Word Representations from Document-Aligned Comparable Data

Authors: Ivan Vulić, Marie-Francine Moens

JAIR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the utility of the induced BWEs in two semantic tasks: (1) bilingual lexicon extraction, (2) suggesting word translations in context for polysemous words. Our simple yet eﬀective BWE-based models signiﬁcantly outperform the Mu PTM-based and contextcounting representation models from comparable data as well as prior BWE-based models, and acquire the best reported results on both tasks for all three tested language pairs.
Researcher Affiliation	Academia	Ivan Vuli c EMAIL University of Cambridge Department of Theoretical and Applied Linguistics 9 West Road, CB3 9DP, Cambridge, UK Marie-Francine Moens EMAIL KU Leuven Department of Computer Science Celestijnenlaan 200A, 3001 Heverlee, Belgium
Pseudocode	No	The paper describes methods and models using mathematical formulations and textual explanations, but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	We will make our pre-training and training code for BWESG publicly available, along with all BWESG-based bilingual word embeddings for the three language pairs at: http://liir.cs.kuleuven.be/software.php.
Open Datasets	Yes	To induce bilingual word embeddings as well as to be directly comparable with baseline representations from prior work, we use a dataset comprising a subset of comparable Wikipedia data available in three language pairs (Vuli c & Moens, 2013b, 2014)... Available online: people.cs.kuleuven.be/~ivan.vulic/software/ ...we use Europarl.v7 (Koehn, 2005) for all three language pairs obtained from the OPUS website (Tiedemann, 2012).9 http://opus.lingfil.uu.se/
Dataset Splits	Yes	Test Data For each language pair, we evaluate on standard 1,000 ground truth one-to-one translation pairs built for the three language pairs (ES/IT/NL-EN) by Vuli c and Moens (2013a, 2013b). Test Data We use the SWTC test set introduced recently (Vuli c & Moens, 2014). The test set comprises 15 polysemous nouns in three languages (ES, IT and NL) along with sets of their translation candidates (i.e., sets T C). For each polysemous noun, the test sets provide 24 sentences extracted from Wikipedia which illustrate diﬀerent senses and translations of the pivot polysemous noun, accompanied by the annotated correct translation for each sentence. It yields 360 test sentences for each language pair (and 1080 test sentences in total). An additional set of 100 IT sentences (5 other polysemous IT nouns plus 20 sentences for each noun) is used as a development set to tune the parameter λ (see Section 5.1) for all language pairs and all models in comparison.
Hardware Specification	Yes	Typically, several hours are needed to train BWESG with d = 300 and cs 48 60, whereas it takes two to three days to train a bilingual topic model with K = 2000 on the same training set using the multi-threaded architectures on 10 Intel(R) Xeon(R) CPU E5-2667 2.90GHz processors.
Software Dependencies	No	The paper mentions software like "SGNS from the word2vec package" and "Tree Tagger (Schmid, 1994)", but does not provide specific version numbers for these tools to ensure reproducibility of the software environment.
Experiment Setup	Yes	All parameters are set to default suggested parameters for SGNS from the word2vec package: stochastic gradient descent (SGD) with a linearly decreasing global learning rate of 0.025, 25 negative samples, subsampling rate 1e 4, and 15 epochs. We have varied the number of dimensions d = 100, 200, 300. We have also trained BWESG with d = 40 to be directly comparable to readily available sets of BWEs from prior work (Chandar et al., 2014). Moreover, to test the eﬀect of window size on the ﬁnal results, i.e., the number of positives used for training, we have varied the maximum window size cs from 4 to 60 in steps of 4.