Mandatory Fields

Authors

Alberto Poncelas, Dimitar Shterionov, Andy Way, Gideon Maillette de Buy Wenniger and Peyman Passban

Conference Title

Investigating Backtranslation in Neural Machine Translation.

Title of Paper

EAMT 2018 - 21st Annual Conference of the European Association for Machine Translation

Year

2018

Month

May

Status

Published

Peer Reviewed

Times Cited

()

Optional Fields

Search Keyword

Editors

Start Page

249

End Page

258

Location

Alicante, Spain

Start Date

28-MAY-18

End Date

30-AUG-18

Abstract

A prerequisite for training corpus-based machine translation (MT) systems – either Statistical MT (SMT) or Neural MT (NMT) – is the availability of high-quality parallel data. This is arguably more important today than ever before, as NMT has been shown in many studies to outperform SMT, but mostly when large parallel corpora are available; in cases where data is limited, SMT can still outperform NMT. Recently researchers have shown that back-translating monolingual data can be used to create synthetic parallel corpora, which in turn can be used in combination with authentic parallel data to train a highquality NMT system. Given that large collections of new parallel text become available only quite rarely, backtranslation has become the norm when building state-of-the-art NMT systems, especially in resource-poor scenarios. However, we assert that there are many unknown factors regarding the actual effects of back-translated data on the translation capabilities of an NMT model. Accordingly, in this work we investigate how using back-translated data as a training corpus – both as a separate standalone dataset as well as combined with human-generated parallel data – affects the performance of an NMT model. We use incrementally larger amounts of back-translated data to train a range of NMT systems for German- to-English, and analyse the resulting translation performance.

Funded By

URL

https://rua.ua.es/dspace/bitstream/10045/76085/1/EAMT2018-Proceedings_27.pdf

DOI Link

Grant Details

Funding Body

Science Foundation Ireland (SFI)

Grant Details

SFI Research Centres Programme (Grant 13/RC/2106); H2020 Marie SkłodowskaCurie grant agreement No 713567