Mandatory Fields

Authors

Peyman Passban, Chris Hokamp, Andy Way and Qun Liu

Conference Title

EAMT 2016

Title of Paper

Improving Phrase-Based SMT Using Cross-Granularity Embedding Similarity

Year

2016

Month

May

Status

Published

Peer Reviewed

Times Cited

()

Optional Fields

Search Keyword

Statistical machine translation, phrase embeddings, incorporating contextual information.

Editors

Start Page

129

End Page

140

Location

Riga, Latvia

Start Date

30-MAY-16

End Date

01-JUN-16

Abstract

The phrase–based statistical machine translation (PBSMT) model can be viewed as a log-linear combination of translation and language model features. Such a model typically relies on the phrase table as the main resource for bilingual knowledge, which in its most basic form consists of aligned phrases, along with four probability scores. These scores only indicate the cooccurrence of phrase pairs in the training corpus, and not necessarily their semantic relatedness. The basic phrase table is also unable to incorporate contextual information about the segments where a particular phrase tends to occur. In this paper, we define six new features which express the semantic relatedness of bilingual phrases. Our method utilizes both source and target side information to enrich the phrase table. The new features are inferred from a bilingual corpus by a neural network (NN). We evaluate our model on the English–Farsi (En–Fa) and English–Czech (En–Cz) pairs and observe considerable improvements in the all En↔Fa and En↔Cz directions

Funded By

URL

http://www.aclweb.org/anthology/W16-3403

DOI Link

Grant Details

Funding Body

Science Foundation Ireland (SFI)

Grant Details

12/CE/I2267