A human judgment corpus and a metric for Arabic MT evaluation

Houda Bouamor, Hanan Alshikhabobakr, Behrang Mohit, Kemal Oflazer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

We present a human judgments dataset and an adapted metric for evaluation of Arabic machine translation. Our mediumscale dataset is the first of its kind for Arabic with high annotation quality. We use the dataset to adapt the BLEU score for Arabic. Our score (AL-BLEU) provides partial credits for stem and morphological matchings of hypothesis and reference words. We evaluate BLEU, METEOR and AL-BLEU on our human judgments corpus and show that AL-BLEU has the highest correlation with human judgments. We are releasing the dataset and software to the research community.

Original languageEnglish
Title of host publicationEMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages207-213
Number of pages7
ISBN (Electronic)9781937284961
Publication statusPublished - 2014
Externally publishedYes
Event2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014 - Doha, Qatar
Duration: 25 Oct 201429 Oct 2014

Other

Other2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014
CountryQatar
CityDoha
Period25/10/1429/10/14

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Vision and Pattern Recognition
  • Information Systems

Cite this

Bouamor, H., Alshikhabobakr, H., Mohit, B., & Oflazer, K. (2014). A human judgment corpus and a metric for Arabic MT evaluation. In EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 207-213). Association for Computational Linguistics (ACL).

A human judgment corpus and a metric for Arabic MT evaluation. / Bouamor, Houda; Alshikhabobakr, Hanan; Mohit, Behrang; Oflazer, Kemal.

EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. Association for Computational Linguistics (ACL), 2014. p. 207-213.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bouamor, H, Alshikhabobakr, H, Mohit, B & Oflazer, K 2014, A human judgment corpus and a metric for Arabic MT evaluation. in EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. Association for Computational Linguistics (ACL), pp. 207-213, 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, Doha, Qatar, 25/10/14.
Bouamor H, Alshikhabobakr H, Mohit B, Oflazer K. A human judgment corpus and a metric for Arabic MT evaluation. In EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. Association for Computational Linguistics (ACL). 2014. p. 207-213
Bouamor, Houda ; Alshikhabobakr, Hanan ; Mohit, Behrang ; Oflazer, Kemal. / A human judgment corpus and a metric for Arabic MT evaluation. EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. Association for Computational Linguistics (ACL), 2014. pp. 207-213
@inproceedings{1ae8fb9ece2447a5968e6750981fcb0d,
title = "A human judgment corpus and a metric for Arabic MT evaluation",
abstract = "We present a human judgments dataset and an adapted metric for evaluation of Arabic machine translation. Our mediumscale dataset is the first of its kind for Arabic with high annotation quality. We use the dataset to adapt the BLEU score for Arabic. Our score (AL-BLEU) provides partial credits for stem and morphological matchings of hypothesis and reference words. We evaluate BLEU, METEOR and AL-BLEU on our human judgments corpus and show that AL-BLEU has the highest correlation with human judgments. We are releasing the dataset and software to the research community.",
author = "Houda Bouamor and Hanan Alshikhabobakr and Behrang Mohit and Kemal Oflazer",
year = "2014",
language = "English",
pages = "207--213",
booktitle = "EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference",
publisher = "Association for Computational Linguistics (ACL)",

}

TY - GEN

T1 - A human judgment corpus and a metric for Arabic MT evaluation

AU - Bouamor, Houda

AU - Alshikhabobakr, Hanan

AU - Mohit, Behrang

AU - Oflazer, Kemal

PY - 2014

Y1 - 2014

N2 - We present a human judgments dataset and an adapted metric for evaluation of Arabic machine translation. Our mediumscale dataset is the first of its kind for Arabic with high annotation quality. We use the dataset to adapt the BLEU score for Arabic. Our score (AL-BLEU) provides partial credits for stem and morphological matchings of hypothesis and reference words. We evaluate BLEU, METEOR and AL-BLEU on our human judgments corpus and show that AL-BLEU has the highest correlation with human judgments. We are releasing the dataset and software to the research community.

AB - We present a human judgments dataset and an adapted metric for evaluation of Arabic machine translation. Our mediumscale dataset is the first of its kind for Arabic with high annotation quality. We use the dataset to adapt the BLEU score for Arabic. Our score (AL-BLEU) provides partial credits for stem and morphological matchings of hypothesis and reference words. We evaluate BLEU, METEOR and AL-BLEU on our human judgments corpus and show that AL-BLEU has the highest correlation with human judgments. We are releasing the dataset and software to the research community.

UR - http://www.scopus.com/inward/record.url?scp=84926030281&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84926030281&partnerID=8YFLogxK

M3 - Conference contribution

SP - 207

EP - 213

BT - EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

PB - Association for Computational Linguistics (ACL)

ER -