Conference Contribution Details
Mandatory Fields
Sheila Castilho & Sharon O'Brien
iMT - Interacting with Machine Translation Workshop at AMTA 2016
Evaluating the Impact of Light Post-Editing on Usability
Austin, Texas
Non Refereed Paper/Abstract
Optional Fields
The increasing use of machine translation (MT) in recent years has resulted in a strong focus on MT evaluation. It is usually assumed that the quality of current machine translation systems still requires post-editing by and when this happens the end results are of high quality. High quality, in turn, means that machine translated content is acceptable and usable and the end user will be satisfied. While automated machine translation becomes ever more pervasive, little is known about how end users engage with raw machine-translated text. This presentation reports on results from experiments to measure the usability of machine translated content by end users, comparing lightly post-edited content against raw machine translation output for German (DE), Simplified Chinese (ZH) and Japanese (JP) target languages, as well as for the English source language. Usability is defined as “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified content of use” (ISO 2002). Effectiveness is measured via goal completion and efficiency is measured via (i) task time and (ii) task time when only completed goals are considered. Satisfaction is defined as “users’ perceptions, feelings, and opinions of the product, usually captured through both written and oral questioning” (Rubin and Chisnell 2011), and as the “freedom from discomfort, and positive attitudes towards the use of the product” (ISO 1998). In order to measure usability, eight tasks (or ‘goals’) were created from Online Help content for a spreadsheet application in collaboration with an industry partner. The tasks were machine translated from English into German, Simplified Chinese and Japanese by the company’s MT system and lightly post-edited by the company’s translation providers. Post-editing was carried out only when terminology and grammatical errors were found in the output. Fourteen native speakers of German, twenty-one native speakers of Simplified Chinese and twenty-eight of Japanese were divided into two groups. One group used the lightly post-edited instructions and the second used the raw machine translated instructions. The English participants who were using the source texts formed one single group. The participants were asked to follow the instructions and perform the tasks in the spreadsheet user interface and their interactions were recorded using an eye tracker. After completion of the tasks, the participants were asked to answer a post-task satisfaction questionnaire in order to gauge their opinion on how useful the instructions were. A web survey was also implemented in order to gather a general indication of satisfaction with genuine users of the software on a large scale. The survey was displayed on the industry partner’s website for 140 articles (EN, DE, ZH and JP) and gathered information on ‘how useful’ the content is for the end user. The online survey consisted of only one multiple choice question: “Was this information helpful?” (YES/NO). The main objectives of the experiments were to i) investigate the extent to which light human post-editing of machine translation impacts on the usability and acceptability of instructional content, and ii) to compare the level of acceptability between German, Simplified Chinese and Japanese as target languages. Results show that the implementation of light post-editing directly influences usability and acceptability. For this MT engine and content, it was found that levels of usability and acceptability were higher for German and Simplified Chinese than for the Japanese language.