Enabling medical translation for low-resource languages

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present research towards bridging the language gap between migrant workers in Qatar and medical staff. In particular, we present the first steps towards the development of a real-world Hindi-English machine translation system for doctor-patient communication. As this is a low-resource language pair, especially for speech and for the medical domain, our initial focus has been on gathering suitable training data from various sources. We applied a variety of methods ranging from fully automatic extraction from the Web to manual annotation of test data. Moreover, we developed a method for automatically augmenting the training data with synthetically generated variants, which yielded a very sizable improvement of more than 3 BLEU points absolute.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers
PublisherSpringer Verlag
Pages3-16
Number of pages14
ISBN (Print)9783319754864
DOIs
Publication statusPublished - 1 Jan 2018
Event17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016 - Konya, Turkey
Duration: 3 Apr 20169 Apr 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9624 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016
CountryTurkey
CityKonya
Period3/4/169/4/16

Fingerprint

Resources
Communication
Machine Translation
Annotation
Language
Training
Speech

Keywords

  • Doctor-patient communication
  • Hindi
  • Machine translation
  • Medical translation
  • Resource-poor languages

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Musleh, A., Durrani, N., Temnikova, I., Nakov, P., Vogel, S., & Alsaad, O. (2018). Enabling medical translation for low-resource languages. In Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers (pp. 3-16). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9624 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-75487-1_1

Enabling medical translation for low-resource languages. / Musleh, Ahmad; Durrani, Nadir; Temnikova, Irina; Nakov, Preslav; Vogel, Stephan; Alsaad, Osama.

Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers. Springer Verlag, 2018. p. 3-16 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9624 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Musleh, A, Durrani, N, Temnikova, I, Nakov, P, Vogel, S & Alsaad, O 2018, Enabling medical translation for low-resource languages. in Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9624 LNCS, Springer Verlag, pp. 3-16, 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016, Konya, Turkey, 3/4/16. https://doi.org/10.1007/978-3-319-75487-1_1
Musleh A, Durrani N, Temnikova I, Nakov P, Vogel S, Alsaad O. Enabling medical translation for low-resource languages. In Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers. Springer Verlag. 2018. p. 3-16. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-75487-1_1
Musleh, Ahmad ; Durrani, Nadir ; Temnikova, Irina ; Nakov, Preslav ; Vogel, Stephan ; Alsaad, Osama. / Enabling medical translation for low-resource languages. Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers. Springer Verlag, 2018. pp. 3-16 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{296b35865edb409a87d4d2ec288e6bbb,
title = "Enabling medical translation for low-resource languages",
abstract = "We present research towards bridging the language gap between migrant workers in Qatar and medical staff. In particular, we present the first steps towards the development of a real-world Hindi-English machine translation system for doctor-patient communication. As this is a low-resource language pair, especially for speech and for the medical domain, our initial focus has been on gathering suitable training data from various sources. We applied a variety of methods ranging from fully automatic extraction from the Web to manual annotation of test data. Moreover, we developed a method for automatically augmenting the training data with synthetically generated variants, which yielded a very sizable improvement of more than 3 BLEU points absolute.",
keywords = "Doctor-patient communication, Hindi, Machine translation, Medical translation, Resource-poor languages",
author = "Ahmad Musleh and Nadir Durrani and Irina Temnikova and Preslav Nakov and Stephan Vogel and Osama Alsaad",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-319-75487-1_1",
language = "English",
isbn = "9783319754864",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "3--16",
booktitle = "Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers",

}

TY - GEN

T1 - Enabling medical translation for low-resource languages

AU - Musleh, Ahmad

AU - Durrani, Nadir

AU - Temnikova, Irina

AU - Nakov, Preslav

AU - Vogel, Stephan

AU - Alsaad, Osama

PY - 2018/1/1

Y1 - 2018/1/1

N2 - We present research towards bridging the language gap between migrant workers in Qatar and medical staff. In particular, we present the first steps towards the development of a real-world Hindi-English machine translation system for doctor-patient communication. As this is a low-resource language pair, especially for speech and for the medical domain, our initial focus has been on gathering suitable training data from various sources. We applied a variety of methods ranging from fully automatic extraction from the Web to manual annotation of test data. Moreover, we developed a method for automatically augmenting the training data with synthetically generated variants, which yielded a very sizable improvement of more than 3 BLEU points absolute.

AB - We present research towards bridging the language gap between migrant workers in Qatar and medical staff. In particular, we present the first steps towards the development of a real-world Hindi-English machine translation system for doctor-patient communication. As this is a low-resource language pair, especially for speech and for the medical domain, our initial focus has been on gathering suitable training data from various sources. We applied a variety of methods ranging from fully automatic extraction from the Web to manual annotation of test data. Moreover, we developed a method for automatically augmenting the training data with synthetically generated variants, which yielded a very sizable improvement of more than 3 BLEU points absolute.

KW - Doctor-patient communication

KW - Hindi

KW - Machine translation

KW - Medical translation

KW - Resource-poor languages

UR - http://www.scopus.com/inward/record.url?scp=85044423883&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85044423883&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-75487-1_1

DO - 10.1007/978-3-319-75487-1_1

M3 - Conference contribution

SN - 9783319754864

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 3

EP - 16

BT - Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers

PB - Springer Verlag

ER -