Mandatory Fields

Authors

Banerjee P.;Naskar S.;Roturier J.;Way A.;Van Genabith J.

Conference Title

COLING 2012: 24th International Conference on Computational Linguistics Technical Papers

Title of Paper

Translation Quality-Based Supplementary Data Selection by Incremental Update of Translation Models

Year

2012

Month

December

Status

Published

Peer Reviewed

Times Cited

()

Optional Fields

Search Keyword

Domain adaptation Incremental update Model merging Statistical machine translation Supplementary data selection

Editors

Start Page

149

End Page

166

Location

Mumbai, India

Start Date

End Date

Abstract

Supplementary data selection from out-of-domain or related-domain data is a well established technique in domain adaptation of statistical machine translation. The selection criteria for such data are mostly based on measures of similarity with available in-domain data, but not directly in terms of translation quality. In this paper, we present a technique for selecting supplementary data to improve translation performance, directly in terms of translation quality, measured by automatic evaluation metric scores. Batches of data selected from out-of-domain corpora are incrementally added to an existing baseline system and evaluated in terms of translation quality on a development set. A batch is selected only if its inclusion improves translation quality. To assist the process, we present a novel translation model merging technique that allows rapid retraining of the translation models with incremental data. When incorporated into the 'in-domain' translation models, the final cumulatively selected datasets are found to provide statistically significant improvements for a number of different supplementary datasets. Furthermore, the translation model merging technique is found to perform on a par with state-of-the-art methods of phrase-table combination. © 2012 The COLING.

Funded By

URL

https://www.aclweb.org/anthology/C12-1010

DOI Link

Grant Details

Funding Body

Grant Details

Science Foundation Ireland (Grant No. 07/CE/I1142) as part of the Centre for Next Generation Localisation (www.cngl.ie)