Conference Publication Details
Mandatory Fields
Sánchez-Martínez F.;Way A.
Proceedings of the 13th Annual Conference of the European Association for Machine Translation, EAMT 2009
Marker-based filtering of bilingual phrase pairs for SMT
2009
December
Published
1
()
Optional Fields
144
151
State-of-the-art statistical machine translation systems make use of a large translation table obtained after scoring a set of bilingual phrase pairs automatically extracted from a parallel corpus. The number of bilingual phrase pairs extracted from a pair of aligned sentences grows exponentially as the length of the sentences increases; therefore, the number of entries in the phrase table used to carry out the translation may become unmanageable, especially when online, 'on demand' translation is required in real time. We describe the use of closed-class words to filter the set of bilingual phrase pairs extracted from the parallel corpus by taking into account the alignment information and the type of the words involved in the alignments. On four European language pairs, we show that our simple yet novel approach can filter the phrase table by up to a third yet still provide competitive results compared to the baseline. Furthermore, it provides a nice balance between the un-filtered approach and pruning using stop words, where the deterioration in translation quality is unacceptably high. © 2009 European Association for Machine Translation.
Grant Details