Conference Publication Details
Mandatory Fields
Groves D.;Way A.
Machine Translation
Hybrid data-driven models of machine translation
2005
December
Published
1
()
Optional Fields
Chunk coverage Convergence Europarl corpus Example-based MT Hybrid Statistical language models Statistical MT
301
323
This paper presents an extended, harmonised account of our previous work on combining subsentential alignments from phrase-based statistical machine translation (SMT) and example-based MT (EBMT) systems to create novel hybrid data-driven systems capable of outperforming the baseline SMT and EBMT systems from which they were derived. In previous work, we demonstrated that while an EBMT system is capable of outperforming a phrase-based SMT (PBSMT) system constructed from freely available resources, a hybrid 'example-based' SMT system incorporating marker chunks and SMT subsentential alignments is capable of outperforming both baseline translation models for French-English translation. In this paper, we show that similar gains are to be had from constructing a hybrid 'statistical' EBMT system. Unlike the previous research, here we use the Europarl training and test sets, which are fast becoming the standard data in the field. On these data sets, while all hybrid 'statistical' EBMT variants still fall short of the quality achieved by the baseline PBSMT system, we show that adding the marker chunks to create a hybrid 'example-based' SMT system outperforms the two baseline systems from which it is derived. Furthermore, we provide further evidence in favour of hybrid systems by adding an SMT target-language model to the EBMT system, and demonstrate that this too has a positive effect on translation quality. We also show that many of the subsentential alignments derived from the Europarl corpus are created by either the PBSMT or the EBMT system, but not by both. In sum, therefore, despite the obvious convergence of the two paradigms, the crucial differences between SMT and EBMT contribute positively to the overall translation quality. The central thesis of this paper is that any researcher who continues to develop an MT system using either of these approaches will benefit further from integrating the advantages of the other model; dogged adherence to one approach will lead to inferior systems being developed. © Springer Science+Business Media 2007.
10.1007/s10590-006-9015-5
Grant Details