Using stem-templates to improve Arabic pos and gender/number tagging

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Citations (Scopus)

Abstract

This paper presents an end-to-end automatic processing system for Arabic. The system performs: correction of common spelling errors pertaining to different forms of alef, ta marbouta and ha, and alef maqsoura and ya; context sensitive word segmentation into underlying clitics, POS tagging, and gender and number tagging of nouns and adjectives. We introduce the use of stem templates as a feature to improve POS tagging by 0.5% and to help ascertain the gender and number of nouns and adjectives. For gender and number tagging, we report accuracies that are significantly higher on previously unseen words compared to a state-of-the-art system.

Original languageEnglish
Title of host publicationProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
PublisherEuropean Language Resources Association (ELRA)
Pages2926-2931
Number of pages6
ISBN (Electronic)9782951740884
Publication statusPublished - 1 Jan 2014
Event9th International Conference on Language Resources and Evaluation, LREC 2014 - Reykjavik, Iceland
Duration: 26 May 201431 May 2014

Other

Other9th International Conference on Language Resources and Evaluation, LREC 2014
CountryIceland
CityReykjavik
Period26/5/1431/5/14

Fingerprint

gender
Template
Tagging
Nouns
Adjective
segmentation
Spelling
Word Segmentation
Automatic Processing
Clitics

Keywords

  • Arabic
  • Denormalization
  • Part of Speech Tagging

ASJC Scopus subject areas

  • Linguistics and Language
  • Library and Information Sciences
  • Education
  • Language and Linguistics

Cite this

Darwish, K., Abdelali, A., & Mubarak, H. (2014). Using stem-templates to improve Arabic pos and gender/number tagging. In Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014 (pp. 2926-2931). European Language Resources Association (ELRA).

Using stem-templates to improve Arabic pos and gender/number tagging. / Darwish, Kareem; Abdelali, Ahmed; Mubarak, Hamdy.

Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA), 2014. p. 2926-2931.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Darwish, K, Abdelali, A & Mubarak, H 2014, Using stem-templates to improve Arabic pos and gender/number tagging. in Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA), pp. 2926-2931, 9th International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, 26/5/14.
Darwish K, Abdelali A, Mubarak H. Using stem-templates to improve Arabic pos and gender/number tagging. In Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA). 2014. p. 2926-2931
Darwish, Kareem ; Abdelali, Ahmed ; Mubarak, Hamdy. / Using stem-templates to improve Arabic pos and gender/number tagging. Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA), 2014. pp. 2926-2931
@inproceedings{da4440294c6a4bd4bb0e50ad7b896cec,
title = "Using stem-templates to improve Arabic pos and gender/number tagging",
abstract = "This paper presents an end-to-end automatic processing system for Arabic. The system performs: correction of common spelling errors pertaining to different forms of alef, ta marbouta and ha, and alef maqsoura and ya; context sensitive word segmentation into underlying clitics, POS tagging, and gender and number tagging of nouns and adjectives. We introduce the use of stem templates as a feature to improve POS tagging by 0.5{\%} and to help ascertain the gender and number of nouns and adjectives. For gender and number tagging, we report accuracies that are significantly higher on previously unseen words compared to a state-of-the-art system.",
keywords = "Arabic, Denormalization, Part of Speech Tagging",
author = "Kareem Darwish and Ahmed Abdelali and Hamdy Mubarak",
year = "2014",
month = "1",
day = "1",
language = "English",
pages = "2926--2931",
booktitle = "Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014",
publisher = "European Language Resources Association (ELRA)",

}

TY - GEN

T1 - Using stem-templates to improve Arabic pos and gender/number tagging

AU - Darwish, Kareem

AU - Abdelali, Ahmed

AU - Mubarak, Hamdy

PY - 2014/1/1

Y1 - 2014/1/1

N2 - This paper presents an end-to-end automatic processing system for Arabic. The system performs: correction of common spelling errors pertaining to different forms of alef, ta marbouta and ha, and alef maqsoura and ya; context sensitive word segmentation into underlying clitics, POS tagging, and gender and number tagging of nouns and adjectives. We introduce the use of stem templates as a feature to improve POS tagging by 0.5% and to help ascertain the gender and number of nouns and adjectives. For gender and number tagging, we report accuracies that are significantly higher on previously unseen words compared to a state-of-the-art system.

AB - This paper presents an end-to-end automatic processing system for Arabic. The system performs: correction of common spelling errors pertaining to different forms of alef, ta marbouta and ha, and alef maqsoura and ya; context sensitive word segmentation into underlying clitics, POS tagging, and gender and number tagging of nouns and adjectives. We introduce the use of stem templates as a feature to improve POS tagging by 0.5% and to help ascertain the gender and number of nouns and adjectives. For gender and number tagging, we report accuracies that are significantly higher on previously unseen words compared to a state-of-the-art system.

KW - Arabic

KW - Denormalization

KW - Part of Speech Tagging

UR - http://www.scopus.com/inward/record.url?scp=84961325242&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84961325242&partnerID=8YFLogxK

M3 - Conference contribution

SP - 2926

EP - 2931

BT - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

PB - European Language Resources Association (ELRA)

ER -