Peer-Reviewed Journal Details
Mandatory Fields
Passban, P;Liu, Q;Way, A
2016
December
Acm Transactions On Asian And Low-Resource Language Information Processing
Boosting Neural POS Tagger for Farsi Using Morphological Information
Published
3 ()
Optional Fields
ALGORITHM
16
Farsi (Persian) is a low-resource language that suffers from the data sparsity problem and a lack of efficient processing tools. Due to their broad application in natural language processing tasks, part-of-speech (POS) taggers are one of those important tools that should be considered in this respect. Despite recent work on Farsi tagging, there is still room for improvement. The best reported accuracy so far is 96%, which in special cases can rise to 96.9%. The main problem with existing taggers is their inefficiency in coping with out-of-vocabulary (OOV) words. Addressing both problems of accuracy and OOV words, we developed a neural network-based POS tagger (NPT) that performs efficiently on Farsi. Despite using less data, NPT provides better results in comparison to state-of-the-art systems. Our proposed tagger performs with an accuracy of 97.4%, with performance highly influenced by morphological features. We carry out a shallow morphological analysis and show considerable improvement over the baseline configuration.
NEW YORK
2375-4699
10.1145/2934676
Grant Details