How Translation Alters Sentiment
Authors: Saif M. Mohammad, Mohammad Salameh, Svetlana Kiritchenko
JAIR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper we systematically examine both options. We use Arabic social media posts as standin for the focus language text. We show that sentiment analysis of English translations of Arabic texts produces competitive results, w.r.t. Arabic sentiment analysis. We show that Arabic sentiment analysis systems benefit from the use of automatically translated English sentiment lexicons. We also conduct manual annotation studies to examine why the sentiment of a translation is different from the sentiment of the source word or text. |
| Researcher Affiliation | Academia | Saif M. Mohammad EMAIL National Research Council Canada Mohammad Salameh EMAIL University of Alberta Svetlana Kiritchenko EMAIL National Research Council Canada |
| Pseudocode | No | The paper describes algorithms like the PMI-based sentiment scoring for lexicon generation, but it presents them as formulas and descriptive text rather than formal pseudocode or algorithm blocks. For example, Sentiment Score(w) = PMI(w, pos) PMI(w, neg) (2) is a formula, not an algorithm block. |
| Open Source Code | No | The paper states that sentiment lexicons and sentiment-labeled corpora are made freely available (e.g., "All of these sentiment lexicons and sentiment-labeled corpora are made freely available.1"), with links provided. However, it does not explicitly state that the source code for the sentiment analysis methodology (their implementation of the NRC-Canada system or other experimental code) is openly available. |
| Open Datasets | Yes | All of these sentiment lexicons and sentiment-labeled corpora are made freely available.1 1. http://www.purl.org/net/Arabic SA We use a randomly chosen subset of 1200 Levantine dialectal sentences, which we will refer to as the BBN posts or BBN dataset, in our experiments. We also conduct experiments on a dataset of 2000 tweets originating from Syria (a country where Levantine dialectal Arabic is commonly spoken). These tweets were collected in May 2014 by polling the Twitter API. Since manual translation of text from Arabic to English is a costly exercise, we chose, for our experiments, an existing Arabic social media dataset that has already been translated the BBN Arabic-Dialect English Parallel Text (Zbib, Malchiodi, Devlin, Stallard, Matsoukas, Schwartz, Makhoul, Zaidan, & Callison-Burch, 2012).5 It contains about 3.5 million tokens of Arabic dialect sentences and their English translations. 5. https://catalog.ldc.upenn.edu/LDC2012T09 |
| Dataset Splits | Yes | In our automatic sentiment analysis experiments, the focus language corpora (BBN dataset and Syria dataset) are each split into test and training folds as part of crossvalidation experiments. Table 5 shows ten-fold cross-validation accuracies obtained on the BBN and Syria datasets using the various Arabic sentiment lexicons discussed above. |
| Hardware Specification | No | The paper discusses the use of a machine translation system (Portage) and Google Translate, along with various software tools for text processing and sentiment analysis. However, it does not specify any hardware components such as CPU models, GPU models, or memory used for running the experiments or training the models. |
| Software Dependencies | No | The paper mentions several software tools and systems: MADA 3.2, CMU Twitter NLP tool, Portage (an in-house MT system), Google Translate, and a linear-kernel Support Vector Machine (citing Chang & Lin, 2011, which implies LIBSVM). However, only MADA is given a specific version number (3.2). The other key software components are mentioned without their versions, which is insufficient for full reproducibility. |
| Experiment Setup | No | The paper mentions that a linear-kernel Support Vector Machine classifier is used and describes feature sets (word and character ngrams, POS, negation, sentiment lexicons, style features). It also mentions some parameters for the MT decoder like "stack size to 10000 and distortion limit to 7" and a "KN-smoothed 5-gram language model". However, it lacks specific training hyperparameters for the SVM model itself, such as learning rates, batch sizes, number of epochs, or specific optimization algorithms, which are crucial for replicating the sentiment analysis experiments. |