Crowdsourcing speech and language data for resource-poor languages

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we present benefits of using crowdsourcing to build speech and language resources for different annotation tasks for dialectal Arabic as an example of resource-poor languages. We show recommendations for job design and quality control that allow us to build high quality data for variety of tasks. Most of these recommendations are language-independent and can be applied to other languages as well. We summarize lessons learned from experiments in data acquisition tasks, such as image annotation (transcription of Arabic historical documents), machine translation (translation from English to Hindi), speech annotation (transcription of dialectal Arabic audio files), text annotation (conversion from dialectal Arabic to Modern Standard Arabic (MSA)), and text classification (annotation of offensive language on Arabic social media, and classification of questions on Arabic medical web forums).

Original languageEnglish
Title of host publicationProceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017
PublisherSpringer Verlag
Pages440-447
Number of pages8
ISBN (Print)9783319648606
DOIs
Publication statusPublished - 1 Jan 2018
Event3rd International Conference on Advanced Intelligent Systems and Informatics, AISI 2017 - Cairo, Egypt
Duration: 9 Sep 201711 Sep 2017

Publication series

NameAdvances in Intelligent Systems and Computing
Volume639
ISSN (Print)2194-5357

Other

Other3rd International Conference on Advanced Intelligent Systems and Informatics, AISI 2017
CountryEgypt
CityCairo
Period9/9/1711/9/17

    Fingerprint

Keywords

  • Crowdsourcing
  • Dialectal arabic
  • Low-resource languages

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Computer Science(all)

Cite this

Mubarak, H. (2018). Crowdsourcing speech and language data for resource-poor languages. In Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017 (pp. 440-447). (Advances in Intelligent Systems and Computing; Vol. 639). Springer Verlag. https://doi.org/10.1007/978-3-319-64861-3_41