Mandatory Fields

Authors

Passban, P;Liu, Q;Way, A

Year

2017

Month

September

Journal

ACM Transactions on Asian and Low-Resource Language Information Processing

Title

Translating Low-Resource Languages by Vocabulary Adaptation from Close Counterparts

Status

Published

Optional Fields

Search Keyword

Statistical machine translation, neural machine translation, low resource languages

Volume

Issue

Start Page

End Page

Abstract

Some natural languages belong to the same family or share similar syntactic and/or semantic regularities. This property persuades researchers to share computational models across languages and benefit from high-quality models to boost existing low-performance counterparts. In this article, we follow a similar idea, whereby we develop statistical and neural machine translation (MT) engines that are trained on one language pair but are used to translate another language. First we train a reliable model for a high-resource language, and then we exploit cross-lingual similarities and adapt the model to work for a close language with almost zero resources. We chose Turkish (Tr) and Azeri or Azerbaijani (Az) as the proposed pair in our experiments. Azeri suffers from lack of resources as there is almost no bilingual corpus for this language. Via our techniques, we are able to train an engine for the Az. English (En) direction, which is able to outperform all other existing models.

Publisher Location

NEW YORK

Editors

Publisher

ASSOC COMPUTING MACHINERY

ISBN / ISSN

2375-4699

Edition

URL

DOI Link

10.1145/3099556

Grant Details

Funding Body

Grant Details