reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

How Translation Alters Sentiment

Authors: Saif M. Mohammad, Mohammad Salameh, Svetlana Kiritchenko

JAIR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper we systematically examine both options. We use Arabic social media posts as standin for the focus language text. We show that sentiment analysis of English translations of Arabic texts produces competitive results, w.r.t. Arabic sentiment analysis. We show that Arabic sentiment analysis systems beneﬁt from the use of automatically translated English sentiment lexicons. We also conduct manual annotation studies to examine why the sentiment of a translation is diﬀerent from the sentiment of the source word or text.
Researcher Affiliation	Academia	Saif M. Mohammad EMAIL National Research Council Canada Mohammad Salameh EMAIL University of Alberta Svetlana Kiritchenko EMAIL National Research Council Canada
Pseudocode	No	The paper describes algorithms like the PMI-based sentiment scoring for lexicon generation, but it presents them as formulas and descriptive text rather than formal pseudocode or algorithm blocks. For example, Sentiment Score(w) = PMI(w, pos) PMI(w, neg) (2) is a formula, not an algorithm block.
Open Source Code	No	The paper states that sentiment lexicons and sentiment-labeled corpora are made freely available (e.g., "All of these sentiment lexicons and sentiment-labeled corpora are made freely available.1"), with links provided. However, it does not explicitly state that the source code for the sentiment analysis methodology (their implementation of the NRC-Canada system or other experimental code) is openly available.
Open Datasets	Yes	All of these sentiment lexicons and sentiment-labeled corpora are made freely available.1 1. http://www.purl.org/net/Arabic SA We use a randomly chosen subset of 1200 Levantine dialectal sentences, which we will refer to as the BBN posts or BBN dataset, in our experiments. We also conduct experiments on a dataset of 2000 tweets originating from Syria (a country where Levantine dialectal Arabic is commonly spoken). These tweets were collected in May 2014 by polling the Twitter API. Since manual translation of text from Arabic to English is a costly exercise, we chose, for our experiments, an existing Arabic social media dataset that has already been translated the BBN Arabic-Dialect English Parallel Text (Zbib, Malchiodi, Devlin, Stallard, Matsoukas, Schwartz, Makhoul, Zaidan, & Callison-Burch, 2012).5 It contains about 3.5 million tokens of Arabic dialect sentences and their English translations. 5. https://catalog.ldc.upenn.edu/LDC2012T09
Dataset Splits	Yes	In our automatic sentiment analysis experiments, the focus language corpora (BBN dataset and Syria dataset) are each split into test and training folds as part of crossvalidation experiments. Table 5 shows ten-fold cross-validation accuracies obtained on the BBN and Syria datasets using the various Arabic sentiment lexicons discussed above.
Hardware Specification	No	The paper discusses the use of a machine translation system (Portage) and Google Translate, along with various software tools for text processing and sentiment analysis. However, it does not specify any hardware components such as CPU models, GPU models, or memory used for running the experiments or training the models.
Software Dependencies	No	The paper mentions several software tools and systems: MADA 3.2, CMU Twitter NLP tool, Portage (an in-house MT system), Google Translate, and a linear-kernel Support Vector Machine (citing Chang & Lin, 2011, which implies LIBSVM). However, only MADA is given a specific version number (3.2). The other key software components are mentioned without their versions, which is insufficient for full reproducibility.
Experiment Setup	No	The paper mentions that a linear-kernel Support Vector Machine classiﬁer is used and describes feature sets (word and character ngrams, POS, negation, sentiment lexicons, style features). It also mentions some parameters for the MT decoder like "stack size to 10000 and distortion limit to 7" and a "KN-smoothed 5-gram language model". However, it lacks specific training hyperparameters for the SVM model itself, such as learning rates, batch sizes, number of epochs, or specific optimization algorithms, which are crucial for replicating the sentiment analysis experiments.