Learning from relatives: Unified dialectal Arabic segmentation

Younes Samih, Mohamed Eldesouki, Mohammed Attia, Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Arabic dialects do not just share a common koiné, but there are shared pan-dialectal linguistic phenomena that allow computational models for dialects to learn from each other. In this paper we build a unified segmentation model where the training data for different dialects are combined and a single model is trained. The model yields higher accuracies than dialect-specific models, eliminating the need for dialect identification before segmentation. We also measure the degree of relatedness between four major Arabic dialects by testing how a segmentation model trained on one dialect performs on the other dialects. We found that linguistic relatedness is contingent with geographical proximity. In our experiments we use SVM-based ranking and bi-LSTM-CRF sequence labeling.

Original languageEnglish
Title of host publicationCoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages432-441
Number of pages10
ISBN (Electronic)9781945626548
Publication statusPublished - 1 Jan 2017
Event21st Conference on Computational Natural Language Learning, CoNLL 2017 - Vancouver, Canada
Duration: 3 Aug 20174 Aug 2017

Publication series

NameCoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings

Conference

Conference21st Conference on Computational Natural Language Learning, CoNLL 2017
CountryCanada
CityVancouver
Period3/8/174/8/17

Fingerprint

dialect
learning
Linguistics
linguistics
Labeling
segmentation
Identification (control systems)
ranking
Testing
experiment
Experiments

ASJC Scopus subject areas

  • Linguistics and Language
  • Artificial Intelligence
  • Human-Computer Interaction

Cite this

Samih, Y., Eldesouki, M., Attia, M., Darwish, K., Abdelali, A., Mubarak, H., & Kallmeyer, L. (2017). Learning from relatives: Unified dialectal Arabic segmentation. In CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings (pp. 432-441). (CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings). Association for Computational Linguistics (ACL).

Learning from relatives : Unified dialectal Arabic segmentation. / Samih, Younes; Eldesouki, Mohamed; Attia, Mohammed; Darwish, Kareem; Abdelali, Ahmed; Mubarak, Hamdy; Kallmeyer, Laura.

CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings. Association for Computational Linguistics (ACL), 2017. p. 432-441 (CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Samih, Y, Eldesouki, M, Attia, M, Darwish, K, Abdelali, A, Mubarak, H & Kallmeyer, L 2017, Learning from relatives: Unified dialectal Arabic segmentation. in CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings. CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings, Association for Computational Linguistics (ACL), pp. 432-441, 21st Conference on Computational Natural Language Learning, CoNLL 2017, Vancouver, Canada, 3/8/17.
Samih Y, Eldesouki M, Attia M, Darwish K, Abdelali A, Mubarak H et al. Learning from relatives: Unified dialectal Arabic segmentation. In CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings. Association for Computational Linguistics (ACL). 2017. p. 432-441. (CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings).
Samih, Younes ; Eldesouki, Mohamed ; Attia, Mohammed ; Darwish, Kareem ; Abdelali, Ahmed ; Mubarak, Hamdy ; Kallmeyer, Laura. / Learning from relatives : Unified dialectal Arabic segmentation. CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings. Association for Computational Linguistics (ACL), 2017. pp. 432-441 (CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings).
@inproceedings{25d6bfe499044044968f0776c1a74d88,
title = "Learning from relatives: Unified dialectal Arabic segmentation",
abstract = "Arabic dialects do not just share a common koin{\'e}, but there are shared pan-dialectal linguistic phenomena that allow computational models for dialects to learn from each other. In this paper we build a unified segmentation model where the training data for different dialects are combined and a single model is trained. The model yields higher accuracies than dialect-specific models, eliminating the need for dialect identification before segmentation. We also measure the degree of relatedness between four major Arabic dialects by testing how a segmentation model trained on one dialect performs on the other dialects. We found that linguistic relatedness is contingent with geographical proximity. In our experiments we use SVM-based ranking and bi-LSTM-CRF sequence labeling.",
author = "Younes Samih and Mohamed Eldesouki and Mohammed Attia and Kareem Darwish and Ahmed Abdelali and Hamdy Mubarak and Laura Kallmeyer",
year = "2017",
month = "1",
day = "1",
language = "English",
series = "CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings",
publisher = "Association for Computational Linguistics (ACL)",
pages = "432--441",
booktitle = "CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings",

}

TY - GEN

T1 - Learning from relatives

T2 - Unified dialectal Arabic segmentation

AU - Samih, Younes

AU - Eldesouki, Mohamed

AU - Attia, Mohammed

AU - Darwish, Kareem

AU - Abdelali, Ahmed

AU - Mubarak, Hamdy

AU - Kallmeyer, Laura

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Arabic dialects do not just share a common koiné, but there are shared pan-dialectal linguistic phenomena that allow computational models for dialects to learn from each other. In this paper we build a unified segmentation model where the training data for different dialects are combined and a single model is trained. The model yields higher accuracies than dialect-specific models, eliminating the need for dialect identification before segmentation. We also measure the degree of relatedness between four major Arabic dialects by testing how a segmentation model trained on one dialect performs on the other dialects. We found that linguistic relatedness is contingent with geographical proximity. In our experiments we use SVM-based ranking and bi-LSTM-CRF sequence labeling.

AB - Arabic dialects do not just share a common koiné, but there are shared pan-dialectal linguistic phenomena that allow computational models for dialects to learn from each other. In this paper we build a unified segmentation model where the training data for different dialects are combined and a single model is trained. The model yields higher accuracies than dialect-specific models, eliminating the need for dialect identification before segmentation. We also measure the degree of relatedness between four major Arabic dialects by testing how a segmentation model trained on one dialect performs on the other dialects. We found that linguistic relatedness is contingent with geographical proximity. In our experiments we use SVM-based ranking and bi-LSTM-CRF sequence labeling.

UR - http://www.scopus.com/inward/record.url?scp=85048018379&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048018379&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85048018379

T3 - CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings

SP - 432

EP - 441

BT - CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings

PB - Association for Computational Linguistics (ACL)

ER -