Peer-Reviewed Journal Details
Mandatory Fields
Peyman Passban, Andy Way, Qun Liu
2016
December
ACM Transactions on Asian Language Information Processing
Boosting Neural POS Tagger for Farsi Using Morphological Information
Published
()
Optional Fields
Computing methodologies, Natural language processing, Neural networks, POS tagging, Farsi, morphological analysis
16
1
Farsi (Persian) is a low-resource language that suffers from the data sparsity problem and a lack of efficient processing tools. Due to their broad application in natural language processing tasks, part-of-speech (POS) taggers are one of those important tools that should be considered in this respect. Despite recent work on Farsi tagging, there is still room for improvement. The best reported accuracy so far is 96%, which in special cases can rise to 96.9%. The main problem with existing taggers is their inefficiency in coping with outof-vocabulary (OOV) words. Addressing both problems of accuracy and OOV words, we developed a neural network-based POS tagger (NPT) that performs efficiently on Farsi. Despite using less data, NPT provides better results in comparison to state-of-the-art systems. Our proposed tagger performs with an accuracy of 97.4%, with performance highly influenced by morphological features. We carry out a shallow morphological analysis and show considerable improvement over the baseline configuration.
2375-4699
http://delivery.acm.org/10.1145/2940000/2934676/a4-passban.pdf?ip=136.206.217.57&id=2934676&acc=ACTIVE%20SERVICE&key=846C3111CE4A4710%2E821500BF45340188%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&__acm__=1523022307_85c4565a93a5a7f0244b2e4d44487eec
10.1145/2934676
Grant Details
Science Foundation Ireland (SFI)
13/RC/2106