A single-model approach for Arabic segmentation, POS tagging, and named entity recognition

Abed Alhakim Freihat, Gabor Bella, Hamdy Mubarak, Fausto Giunchiglia

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

This paper presents an entirely new, one-million-word annotated corpus for a comprehensive, machine-learning-based preprocessing of text in Modern Standard Arabic. Contrary to the conventional pipeline architecture, we solve the NLP tasks of word segmentation, POS tagging and named entity recognition as a single sequence labeling task. This single-component configuration results in a faster operation and is able to provide state-of-the-art precision and recall according to our evaluations. The fine-grained output tag set output by our annotator greatly simplifies downstream tasks such as lemmatization. Provided as a trained OpenNLP component, the annotator is free for research purposes.

Original languageEnglish
Title of host publication2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-8
Number of pages8
ISBN (Electronic)9781538645437
DOIs
Publication statusPublished - 6 Jun 2018
Event2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018 - Algiers, Algeria
Duration: 25 Apr 201826 Apr 2018

Other

Other2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018
CountryAlgeria
CityAlgiers
Period25/4/1826/4/18

    Fingerprint

Keywords

  • Lemmatization
  • Machine learning
  • Named entity recognition
  • NLP
  • POS tagging
  • Segmentation

ASJC Scopus subject areas

  • Linguistics and Language
  • Communication
  • Artificial Intelligence
  • Signal Processing

Cite this

Freihat, A. A., Bella, G., Mubarak, H., & Giunchiglia, F. (2018). A single-model approach for Arabic segmentation, POS tagging, and named entity recognition. In 2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018 (pp. 1-8). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICNLSP.2018.8374393