Conference Publication Details
Mandatory Fields
Vanmassenhove E.;Du J.;Way A.
Computational Linguistics in the Netherlands Journal
Investigating 'aspect' in NMT and SMT: Translating the english simple past and present perfect
2017
December
Published
1
()
Optional Fields
109
127
© 2017 Eva Vanmassenhove, Jinhua Du and Andy Way. One of the important differences between English and French grammar is related to how their verbal systems handle aspectual information. While the English simple past tense is aspectually neutral, the French and Spanish past tenses are linked with a particular imperfective/perfective aspect. This study examines what Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) learn about 'aspect' and how this is reected in the translations they pro-duce. We use their main knowledge sources, phrase-tables (SMT) and encoding vectors (NMT), to examine what kind of aspectual information they encode. Furthermore, we examine whether this encoded 'knowledge' is actually transferred during decoding and thus reected in the actual translations. Our study is based on the translations of the English simple past and present perfect tenses into French and Spanish imperfective and perfective past tenses. We examine the interac-tion between the lexical aspect of English simple past verbs and the grammatical aspect expressed by the tense in the French/Spanish translations. It results that SMT phrase-tables contain in-formation about the basic lexical aspect of verbs. Although lexical aspect is often closely related to the grammatical aspect expressed by the French and Spanish tenses, for some verbs (mainly atelic dynamic verbs) more contextual information is required in order to select an appropriate tense. The SMT n-grams provide insuficient context to grasp other aspectual factors included in the sentence to consistently select the tense with the appropriate aspectual value. On the other hand, the encoding vectors produced by our NMT system do contain information about the entire sentence. An analysis based on the English NMT encoding vectors shows that a logistic regression model can obtain an accuracy of 90% when trying to predict the correct tense based on the en-coding vectors. However, these positive results are not entirely reected in the actual translations, i.e. part of the aspectual information is lost during decoding.
Grant Details