An Analysis (and an Annotated Corpus) of user responses to machine translation output

Daniele Pighin, Lluis Marques, Jonathan May

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

We present an annotated resource consisting of open-domain translation requests, automatic translations and user-provided corrections collected from casual users of the translation portal http://reverso.net. The layers of annotation provide: 1) quality assessments for 830 correction suggestions for translations into English, at the segment level, and 2) 814 usefulness assessments for English-Spanish and English-French translation suggestions, a suggestion being useful if it contains at least local clues that can be used to improve translation quality. We also discuss the results of our preliminary experiments concerning 1) the development of an automatic filter to separate useful from non-useful feedback, and 2) the incorporation in the machine translation pipeline of bilingual phrases extracted from the suggestions. The annotated data, available for download from ftp://mi.eng.cam.ac.uk/data/faust/LW-UPC-Oct11-FAUST-feedback-annotation.tgz, is released under a Creative Commons license. To our best knowledge, this is the first resource of this kind that has ever been made publicly available.

Original languageEnglish
Title of host publicationProceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012
PublisherEuropean Language Resources Association (ELRA)
Pages1131-1136
Number of pages6
ISBN (Electronic)9782951740877
Publication statusPublished - 1 Jan 2012
Event8th International Conference on Language Resources and Evaluation, LREC 2012 - Istanbul, Turkey
Duration: 21 May 201227 May 2012

Other

Other8th International Conference on Language Resources and Evaluation, LREC 2012
CountryTurkey
CityIstanbul
Period21/5/1227/5/12

Fingerprint

Machine Translation
resources
license
Annotation
Resources
experiment
Faust
Experiment
Filter
Quality Assessment
Usefulness
English Translation
English-Spanish
Automatic Translation
Layer
French Translation

Keywords

  • Annotated Corpus
  • Feedback Filtering
  • Machine Translation

ASJC Scopus subject areas

  • Linguistics and Language
  • Language and Linguistics
  • Education
  • Library and Information Sciences

Cite this

Pighin, D., Marques, L., & May, J. (2012). An Analysis (and an Annotated Corpus) of user responses to machine translation output. In Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012 (pp. 1131-1136). European Language Resources Association (ELRA).

An Analysis (and an Annotated Corpus) of user responses to machine translation output. / Pighin, Daniele; Marques, Lluis; May, Jonathan.

Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012. European Language Resources Association (ELRA), 2012. p. 1131-1136.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pighin, D, Marques, L & May, J 2012, An Analysis (and an Annotated Corpus) of user responses to machine translation output. in Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012. European Language Resources Association (ELRA), pp. 1131-1136, 8th International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey, 21/5/12.
Pighin D, Marques L, May J. An Analysis (and an Annotated Corpus) of user responses to machine translation output. In Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012. European Language Resources Association (ELRA). 2012. p. 1131-1136
Pighin, Daniele ; Marques, Lluis ; May, Jonathan. / An Analysis (and an Annotated Corpus) of user responses to machine translation output. Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012. European Language Resources Association (ELRA), 2012. pp. 1131-1136
@inproceedings{abe9bc91df9e427fb319d616a9c5a56e,
title = "An Analysis (and an Annotated Corpus) of user responses to machine translation output",
abstract = "We present an annotated resource consisting of open-domain translation requests, automatic translations and user-provided corrections collected from casual users of the translation portal http://reverso.net. The layers of annotation provide: 1) quality assessments for 830 correction suggestions for translations into English, at the segment level, and 2) 814 usefulness assessments for English-Spanish and English-French translation suggestions, a suggestion being useful if it contains at least local clues that can be used to improve translation quality. We also discuss the results of our preliminary experiments concerning 1) the development of an automatic filter to separate useful from non-useful feedback, and 2) the incorporation in the machine translation pipeline of bilingual phrases extracted from the suggestions. The annotated data, available for download from ftp://mi.eng.cam.ac.uk/data/faust/LW-UPC-Oct11-FAUST-feedback-annotation.tgz, is released under a Creative Commons license. To our best knowledge, this is the first resource of this kind that has ever been made publicly available.",
keywords = "Annotated Corpus, Feedback Filtering, Machine Translation",
author = "Daniele Pighin and Lluis Marques and Jonathan May",
year = "2012",
month = "1",
day = "1",
language = "English",
pages = "1131--1136",
booktitle = "Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012",
publisher = "European Language Resources Association (ELRA)",

}

TY - GEN

T1 - An Analysis (and an Annotated Corpus) of user responses to machine translation output

AU - Pighin, Daniele

AU - Marques, Lluis

AU - May, Jonathan

PY - 2012/1/1

Y1 - 2012/1/1

N2 - We present an annotated resource consisting of open-domain translation requests, automatic translations and user-provided corrections collected from casual users of the translation portal http://reverso.net. The layers of annotation provide: 1) quality assessments for 830 correction suggestions for translations into English, at the segment level, and 2) 814 usefulness assessments for English-Spanish and English-French translation suggestions, a suggestion being useful if it contains at least local clues that can be used to improve translation quality. We also discuss the results of our preliminary experiments concerning 1) the development of an automatic filter to separate useful from non-useful feedback, and 2) the incorporation in the machine translation pipeline of bilingual phrases extracted from the suggestions. The annotated data, available for download from ftp://mi.eng.cam.ac.uk/data/faust/LW-UPC-Oct11-FAUST-feedback-annotation.tgz, is released under a Creative Commons license. To our best knowledge, this is the first resource of this kind that has ever been made publicly available.

AB - We present an annotated resource consisting of open-domain translation requests, automatic translations and user-provided corrections collected from casual users of the translation portal http://reverso.net. The layers of annotation provide: 1) quality assessments for 830 correction suggestions for translations into English, at the segment level, and 2) 814 usefulness assessments for English-Spanish and English-French translation suggestions, a suggestion being useful if it contains at least local clues that can be used to improve translation quality. We also discuss the results of our preliminary experiments concerning 1) the development of an automatic filter to separate useful from non-useful feedback, and 2) the incorporation in the machine translation pipeline of bilingual phrases extracted from the suggestions. The annotated data, available for download from ftp://mi.eng.cam.ac.uk/data/faust/LW-UPC-Oct11-FAUST-feedback-annotation.tgz, is released under a Creative Commons license. To our best knowledge, this is the first resource of this kind that has ever been made publicly available.

KW - Annotated Corpus

KW - Feedback Filtering

KW - Machine Translation

UR - http://www.scopus.com/inward/record.url?scp=84896063931&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84896063931&partnerID=8YFLogxK

M3 - Conference contribution

SP - 1131

EP - 1136

BT - Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012

PB - European Language Resources Association (ELRA)

ER -