Mandatory Fields

Authors

Vanmassenhove E.;Du J.;Way A.

Conference Title

Computational Linguistics in the Netherlands Journal

Title of Paper

Investigating 'aspect' in NMT and SMT: Translating the english simple past and present perfect

Year

2017

Month

December

Status

Published

Peer Reviewed

Times Cited

()

Optional Fields

Search Keyword

Editors

Start Page

109

End Page

127

Location

Start Date

End Date

Abstract

© 2017 Eva Vanmassenhove, Jinhua Du and Andy Way. One of the important differences between English and French grammar is related to how their verbal systems handle aspectual information. While the English simple past tense is aspectually neutral, the French and Spanish past tenses are linked with a particular imperfective/perfective aspect. This study examines what Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) learn about 'aspect' and how this is reected in the translations they pro-duce. We use their main knowledge sources, phrase-tables (SMT) and encoding vectors (NMT), to examine what kind of aspectual information they encode. Furthermore, we examine whether this encoded 'knowledge' is actually transferred during decoding and thus reected in the actual translations. Our study is based on the translations of the English simple past and present perfect tenses into French and Spanish imperfective and perfective past tenses. We examine the interac-tion between the lexical aspect of English simple past verbs and the grammatical aspect expressed by the tense in the French/Spanish translations. It results that SMT phrase-tables contain in-formation about the basic lexical aspect of verbs. Although lexical aspect is often closely related to the grammatical aspect expressed by the French and Spanish tenses, for some verbs (mainly atelic dynamic verbs) more contextual information is required in order to select an appropriate tense. The SMT n-grams provide insuficient context to grasp other aspectual factors included in the sentence to consistently select the tense with the appropriate aspectual value. On the other hand, the encoding vectors produced by our NMT system do contain information about the entire sentence. An analysis based on the English NMT encoding vectors shows that a logistic regression model can obtain an accuracy of 90% when trying to predict the correct tense based on the en-coding vectors. However, these positive results are not entirely reected in the actual translations, i.e. part of the aspectual information is lost during decoding.

Funded By

URL

DOI Link

Grant Details

Funding Body

Grant Details