Communicating unknown words in machine translation

Matthias Eck, Stephan Vogel, Alex Waibel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

A new approach to handle unknown words in machine translation is presented. The basic idea is to find definitions for the unknown words on the source language side and translate those definitions instead. Only monolingual resources are required, which generally offer a broader coverage than bilingual resources and are available for a large number of languages. In order to use this in a machine translation system definitions are extracted automatically from online dictionaries and encyclopedias. The translated definition is then inserted and clearly marked in the original hypothesis. This is shown to lead to significant improvements in (subjective) translation quality.

Original languageEnglish
Title of host publicationProceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
PublisherEuropean Language Resources Association (ELRA)
Pages1542-1547
Number of pages6
ISBN (Electronic)2951740840, 9782951740846
Publication statusPublished - 1 Jan 2008
Externally publishedYes
Event6th International Conference on Language Resources and Evaluation, LREC 2008 - Marrakech, Morocco
Duration: 28 May 200830 May 2008

Other

Other6th International Conference on Language Resources and Evaluation, LREC 2008
CountryMorocco
CityMarrakech
Period28/5/0830/5/08

Fingerprint

language
resources
dictionary
coverage
Resources
Machine Translation
Machine Translation System
Language
Source Language
Online Dictionary

ASJC Scopus subject areas

  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics
  • Education

Cite this

Eck, M., Vogel, S., & Waibel, A. (2008). Communicating unknown words in machine translation. In Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 (pp. 1542-1547). European Language Resources Association (ELRA).

Communicating unknown words in machine translation. / Eck, Matthias; Vogel, Stephan; Waibel, Alex.

Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008. European Language Resources Association (ELRA), 2008. p. 1542-1547.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Eck, M, Vogel, S & Waibel, A 2008, Communicating unknown words in machine translation. in Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008. European Language Resources Association (ELRA), pp. 1542-1547, 6th International Conference on Language Resources and Evaluation, LREC 2008, Marrakech, Morocco, 28/5/08.
Eck M, Vogel S, Waibel A. Communicating unknown words in machine translation. In Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008. European Language Resources Association (ELRA). 2008. p. 1542-1547
Eck, Matthias ; Vogel, Stephan ; Waibel, Alex. / Communicating unknown words in machine translation. Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008. European Language Resources Association (ELRA), 2008. pp. 1542-1547
@inproceedings{fed414cb40b542728cb52b52ba709567,
title = "Communicating unknown words in machine translation",
abstract = "A new approach to handle unknown words in machine translation is presented. The basic idea is to find definitions for the unknown words on the source language side and translate those definitions instead. Only monolingual resources are required, which generally offer a broader coverage than bilingual resources and are available for a large number of languages. In order to use this in a machine translation system definitions are extracted automatically from online dictionaries and encyclopedias. The translated definition is then inserted and clearly marked in the original hypothesis. This is shown to lead to significant improvements in (subjective) translation quality.",
author = "Matthias Eck and Stephan Vogel and Alex Waibel",
year = "2008",
month = "1",
day = "1",
language = "English",
pages = "1542--1547",
booktitle = "Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008",
publisher = "European Language Resources Association (ELRA)",

}

TY - GEN

T1 - Communicating unknown words in machine translation

AU - Eck, Matthias

AU - Vogel, Stephan

AU - Waibel, Alex

PY - 2008/1/1

Y1 - 2008/1/1

N2 - A new approach to handle unknown words in machine translation is presented. The basic idea is to find definitions for the unknown words on the source language side and translate those definitions instead. Only monolingual resources are required, which generally offer a broader coverage than bilingual resources and are available for a large number of languages. In order to use this in a machine translation system definitions are extracted automatically from online dictionaries and encyclopedias. The translated definition is then inserted and clearly marked in the original hypothesis. This is shown to lead to significant improvements in (subjective) translation quality.

AB - A new approach to handle unknown words in machine translation is presented. The basic idea is to find definitions for the unknown words on the source language side and translate those definitions instead. Only monolingual resources are required, which generally offer a broader coverage than bilingual resources and are available for a large number of languages. In order to use this in a machine translation system definitions are extracted automatically from online dictionaries and encyclopedias. The translated definition is then inserted and clearly marked in the original hypothesis. This is shown to lead to significant improvements in (subjective) translation quality.

UR - http://www.scopus.com/inward/record.url?scp=84981333159&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84981333159&partnerID=8YFLogxK

M3 - Conference contribution

SP - 1542

EP - 1547

BT - Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008

PB - European Language Resources Association (ELRA)

ER -