Peer-Reviewed Journal Details
Mandatory Fields
Haque, R; Penkale, S; Way, A
2018
February
Language Resources and Evaluation
TermFinder: log-likelihood comparison and phrase-based statistical machine translation models for bilingual terminology extraction
Published
5 ()
Optional Fields
Terminology extraction Statistical machine translation Phrase-based statistical machine translation Log-likelihood Dice coefficient
52
365
400
Bilingual termbanks are important for many natural language processing applications, especially in translation workflows in industrial settings. In this paper, we apply a log-likelihood comparison method to extract monolingual terminology from the source and target sides of a parallel corpus. The initial candidate terminology list is prepared by taking all arbitrary n-gram word sequences from the corpus. Then, a well-known statistical measure (the Dice coefficient) is employed in order to remove any multi-word terms with weak associations from the candidate term list. Thereafter, the log-likelihood comparison method is applied to rank the phrasal candidate term list. Then, using a phrase-based statistical machine translation model, we create a bilingual terminology with the extracted monolingual term lists. We integrate an external knowledge source-the Wikipedia cross-language link databases-into the terminology extraction (TE) model to assist two processes: (a) the ranking of the extracted terminology list, and (b) the selection of appropriate target terms for a source term. First, we report the performance of our monolingual TE model compared to a number of the state-of-the-art TE models on English-to-Turkish and English-to-Hindi data sets. Then, we evaluate our novel bilingual TE model on an English-to-Turkish data set, and report the automatic evaluation results. We also manually evaluate our novel TE model on English-to-Spanish and English-to-Hindi data sets, and observe excellent performance for all domains.
Springer Netherlands
1574-020X
https://link.springer.com/article/10.1007/s10579-018-9412-4
10.1007/s10579-018-9412-4
Grant Details
Secretary General Libyan-Sudanese Integration (LSI)