Abstract
In translating text where sentiment is the main message, human translators
give particular attention to sentiment-carrying words. The reason is that an
incorrect translation of such words would miss the fundamental aspect of the
source text, i.e. the author's sentiment. In the online world, MT systems are
extensively used to translate User-Generated Content (UGC) such as reviews,
tweets, and social media posts, where the main message is often the author's
positive or negative attitude towards the topic of the text. It is important in
such scenarios to accurately measure how far an MT system can be a reliable
real-life utility in transferring the correct affect message. This paper
tackles an under-recognised problem in the field of machine translation
evaluation which is judging to what extent automatic metrics concur with the
gold standard of human evaluation for a correct translation of sentiment. We
evaluate the efficacy of conventional quality metrics in spotting a
mistranslation of sentiment, especially when it is the sole error in the MT
output. We propose a numerical `sentiment-closeness' measure appropriate for
assessing the accuracy of a translated affect message in UGC text by an MT
system. We will show that incorporating this sentiment-aware measure can
significantly enhance the correlation of some available quality metrics with
the human judgement of an accurate translation of sentiment.