Active learning and crowd-sourcing for machine translation

Vamshi Ambati, Stephan Vogel, Jaime Carbonell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

81 Citations (Scopus)

Abstract

In recent years, corpus based approaches to machine translation have become predominant, with Statistical Machine Translation (SMT) being the most actively progressing area. Success of these approaches depends on the availability of parallel corpora. In this paper we propose Active Crowd Translation (ACT), a new paradigm where active learning and crowd-sourcing come together to enable automatic translation for low-resource language pairs. Active learning aims at reducing cost of label acquisition by prioritizing the most informative data for annotation, while crowd-sourcing reduces cost by using the power of the crowds to make do for the lack of expensive language experts. We experiment and compare our active learning strategies with strong baselines and see significant improvements in translation quality. Similarly, our experiments with crowd-sourcing on Mechanical Turk have shown that it is possible to create parallel corpora using non-experts and with sufficient quality assurance, a translation system that is trained using this corpus approaches expert quality.

Original languageEnglish
Title of host publicationProceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
PublisherEuropean Language Resources Association (ELRA)
Pages2169-2174
Number of pages6
ISBN (Electronic)2951740867, 9782951740860
Publication statusPublished - 1 Jan 2010
Externally publishedYes
Event7th International Conference on Language Resources and Evaluation, LREC 2010 - Valletta, Malta
Duration: 17 May 201023 May 2010

Other

Other7th International Conference on Language Resources and Evaluation, LREC 2010
CountryMalta
CityValletta
Period17/5/1023/5/10

Fingerprint

learning
expert
Turk
experiment
costs
language
quality assurance
learning strategy
Machine Translation
Active Learning
Crowds
Sourcing
paradigm
lack
resources
Experiment
Parallel Corpora
Language
Costs
Quality Assurance

ASJC Scopus subject areas

  • Education
  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics

Cite this

Ambati, V., Vogel, S., & Carbonell, J. (2010). Active learning and crowd-sourcing for machine translation. In Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010 (pp. 2169-2174). European Language Resources Association (ELRA).

Active learning and crowd-sourcing for machine translation. / Ambati, Vamshi; Vogel, Stephan; Carbonell, Jaime.

Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA), 2010. p. 2169-2174.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ambati, V, Vogel, S & Carbonell, J 2010, Active learning and crowd-sourcing for machine translation. in Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA), pp. 2169-2174, 7th International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta, 17/5/10.
Ambati V, Vogel S, Carbonell J. Active learning and crowd-sourcing for machine translation. In Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA). 2010. p. 2169-2174
Ambati, Vamshi ; Vogel, Stephan ; Carbonell, Jaime. / Active learning and crowd-sourcing for machine translation. Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA), 2010. pp. 2169-2174
@inproceedings{e00cfd5cbab84d86a0728f81348d4308,
title = "Active learning and crowd-sourcing for machine translation",
abstract = "In recent years, corpus based approaches to machine translation have become predominant, with Statistical Machine Translation (SMT) being the most actively progressing area. Success of these approaches depends on the availability of parallel corpora. In this paper we propose Active Crowd Translation (ACT), a new paradigm where active learning and crowd-sourcing come together to enable automatic translation for low-resource language pairs. Active learning aims at reducing cost of label acquisition by prioritizing the most informative data for annotation, while crowd-sourcing reduces cost by using the power of the crowds to make do for the lack of expensive language experts. We experiment and compare our active learning strategies with strong baselines and see significant improvements in translation quality. Similarly, our experiments with crowd-sourcing on Mechanical Turk have shown that it is possible to create parallel corpora using non-experts and with sufficient quality assurance, a translation system that is trained using this corpus approaches expert quality.",
author = "Vamshi Ambati and Stephan Vogel and Jaime Carbonell",
year = "2010",
month = "1",
day = "1",
language = "English",
pages = "2169--2174",
booktitle = "Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010",
publisher = "European Language Resources Association (ELRA)",

}

TY - GEN

T1 - Active learning and crowd-sourcing for machine translation

AU - Ambati, Vamshi

AU - Vogel, Stephan

AU - Carbonell, Jaime

PY - 2010/1/1

Y1 - 2010/1/1

N2 - In recent years, corpus based approaches to machine translation have become predominant, with Statistical Machine Translation (SMT) being the most actively progressing area. Success of these approaches depends on the availability of parallel corpora. In this paper we propose Active Crowd Translation (ACT), a new paradigm where active learning and crowd-sourcing come together to enable automatic translation for low-resource language pairs. Active learning aims at reducing cost of label acquisition by prioritizing the most informative data for annotation, while crowd-sourcing reduces cost by using the power of the crowds to make do for the lack of expensive language experts. We experiment and compare our active learning strategies with strong baselines and see significant improvements in translation quality. Similarly, our experiments with crowd-sourcing on Mechanical Turk have shown that it is possible to create parallel corpora using non-experts and with sufficient quality assurance, a translation system that is trained using this corpus approaches expert quality.

AB - In recent years, corpus based approaches to machine translation have become predominant, with Statistical Machine Translation (SMT) being the most actively progressing area. Success of these approaches depends on the availability of parallel corpora. In this paper we propose Active Crowd Translation (ACT), a new paradigm where active learning and crowd-sourcing come together to enable automatic translation for low-resource language pairs. Active learning aims at reducing cost of label acquisition by prioritizing the most informative data for annotation, while crowd-sourcing reduces cost by using the power of the crowds to make do for the lack of expensive language experts. We experiment and compare our active learning strategies with strong baselines and see significant improvements in translation quality. Similarly, our experiments with crowd-sourcing on Mechanical Turk have shown that it is possible to create parallel corpora using non-experts and with sufficient quality assurance, a translation system that is trained using this corpus approaches expert quality.

UR - http://www.scopus.com/inward/record.url?scp=84883366075&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84883366075&partnerID=8YFLogxK

M3 - Conference contribution

SP - 2169

EP - 2174

BT - Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010

PB - European Language Resources Association (ELRA)

ER -