Conference Publication Details
Mandatory Fields
Banerjee P.;Naskar S.;Roturier J.;Way A.;Van Genabith J.
COLING 2012: 24th International Conference on Computational Linguistics Technical Papers
Translation Quality-Based Supplementary Data Selection by Incremental Update of Translation Models
2012
December
Published
1
()
Optional Fields
Domain adaptation Incremental update Model merging Statistical machine translation Supplementary data selection
149
166
Mumbai, India
Supplementary data selection from out-of-domain or related-domain data is a well established technique in domain adaptation of statistical machine translation. The selection criteria for such data are mostly based on measures of similarity with available in-domain data, but not directly in terms of translation quality. In this paper, we present a technique for selecting supplementary data to improve translation performance, directly in terms of translation quality, measured by automatic evaluation metric scores. Batches of data selected from out-of-domain corpora are incrementally added to an existing baseline system and evaluated in terms of translation quality on a development set. A batch is selected only if its inclusion improves translation quality. To assist the process, we present a novel translation model merging technique that allows rapid retraining of the translation models with incremental data. When incorporated into the 'in-domain' translation models, the final cumulatively selected datasets are found to provide statistically significant improvements for a number of different supplementary datasets. Furthermore, the translation model merging technique is found to perform on a par with state-of-the-art methods of phrase-table combination. © 2012 The COLING.
https://www.aclweb.org/anthology/C12-1010
Grant Details
Science Foundation Ireland (Grant No. 07/CE/I1142) as part of the Centre for Next Generation Localisation (www.cngl.ie)