Abstract
Proceedings of the Fifth Arabic Natural Language Processing
Workshop WANLP 2020 Since the advent of Neural Machine Translation (NMT) approaches there has
been a tremendous improvement in the quality of automatic translation. However,
NMT output still lacks accuracy in some low-resource languages and sometimes
makes major errors that need extensive post-editing. This is particularly
noticeable with texts that do not follow common lexico-grammatical standards,
such as user generated content (UGC). In this paper we investigate the
challenges involved in translating book reviews from Arabic into English, with
particular focus on the errors that lead to incorrect translation of sentiment
polarity. Our study points to the special characteristics of Arabic UGC,
examines the sentiment transfer errors made by Google Translate of Arabic UGC
to English, analyzes why the problem occurs, and proposes an error typology
specific of the translation of Arabic UGC. Our analysis shows that the output
of online translation tools of Arabic UGC can either fail to transfer the
sentiment at all by producing a neutral target text, or completely flips the
sentiment polarity of the target word or phrase and hence delivers a wrong
affect message. We address this problem by fine-tuning an NMT model with
respect to sentiment polarity showing that this approach can significantly help
with correcting sentiment errors detected in the online translation of Arabic
UGC.