Using stem-templates to improve Arabic pos and gender/number tagging

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Citations (Scopus)

Abstract

This paper presents an end-to-end automatic processing system for Arabic. The system performs: correction of common spelling errors pertaining to different forms of alef, ta marbouta and ha, and alef maqsoura and ya; context sensitive word segmentation into underlying clitics, POS tagging, and gender and number tagging of nouns and adjectives. We introduce the use of stem templates as a feature to improve POS tagging by 0.5% and to help ascertain the gender and number of nouns and adjectives. For gender and number tagging, we report accuracies that are significantly higher on previously unseen words compared to a state-of-the-art system.

Original languageEnglish
Title of host publicationProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
PublisherEuropean Language Resources Association (ELRA)
Pages2926-2931
Number of pages6
ISBN (Electronic)9782951740884
Publication statusPublished - 1 Jan 2014
Event9th International Conference on Language Resources and Evaluation, LREC 2014 - Reykjavik, Iceland
Duration: 26 May 201431 May 2014

Other

Other9th International Conference on Language Resources and Evaluation, LREC 2014
CountryIceland
CityReykjavik
Period26/5/1431/5/14

    Fingerprint

Keywords

  • Arabic
  • Denormalization
  • Part of Speech Tagging

ASJC Scopus subject areas

  • Linguistics and Language
  • Library and Information Sciences
  • Education
  • Language and Linguistics

Cite this

Darwish, K., Abdelali, A., & Mubarak, H. (2014). Using stem-templates to improve Arabic pos and gender/number tagging. In Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014 (pp. 2926-2931). European Language Resources Association (ELRA).