Document-level machine translation with word vector models

Eva Martínez Garcia, Cristina España-Bonet, Lluís Màrquez

Research output: Contribution to conferencePaper

5 Citations (Scopus)

Abstract

In this paper we apply distributional semantic information to document-level machine translation. We train monolingual and bilingual word vector models on large corpora and we evaluate them first in a cross-lingual lexical substitution task and then on the final translation task. For translation, we incorporate the semantic information in a statistical document-level decoder (Docent), by enforcing translation choices that are semantically similar to the context. As expected, the bilingual word vector models are more appropriate for the purpose of translation. The final document-level translator incorporating the semantic model outperforms the basic Docent (without semantics) and also performs slightly over a standard sentence-level SMT system in terms of ULC (the average of a set of standard automatic evaluation metrics for MT). Finally, we also present some manual analysis of the translations of some concrete documents.

Original languageEnglish
Pages59-66
Number of pages8
Publication statusPublished - 1 Jan 2015
Event18th Annual Conference of the European Association for Machine Translation, EAMT 2015 - Antalya, Turkey
Duration: 11 May 201513 May 2015

Conference

Conference18th Annual Conference of the European Association for Machine Translation, EAMT 2015
CountryTurkey
CityAntalya
Period11/5/1513/5/15

Fingerprint

Semantics
Surface mount technology
Substitution reactions
Concretes
Machine Translation
Semantic Information

ASJC Scopus subject areas

  • Language and Linguistics
  • Software
  • Human-Computer Interaction

Cite this

Garcia, E. M., España-Bonet, C., & Màrquez, L. (2015). Document-level machine translation with word vector models. 59-66. Paper presented at 18th Annual Conference of the European Association for Machine Translation, EAMT 2015, Antalya, Turkey.

Document-level machine translation with word vector models. / Garcia, Eva Martínez; España-Bonet, Cristina; Màrquez, Lluís.

2015. 59-66 Paper presented at 18th Annual Conference of the European Association for Machine Translation, EAMT 2015, Antalya, Turkey.

Research output: Contribution to conferencePaper

Garcia, EM, España-Bonet, C & Màrquez, L 2015, 'Document-level machine translation with word vector models', Paper presented at 18th Annual Conference of the European Association for Machine Translation, EAMT 2015, Antalya, Turkey, 11/5/15 - 13/5/15 pp. 59-66.
Garcia EM, España-Bonet C, Màrquez L. Document-level machine translation with word vector models. 2015. Paper presented at 18th Annual Conference of the European Association for Machine Translation, EAMT 2015, Antalya, Turkey.
Garcia, Eva Martínez ; España-Bonet, Cristina ; Màrquez, Lluís. / Document-level machine translation with word vector models. Paper presented at 18th Annual Conference of the European Association for Machine Translation, EAMT 2015, Antalya, Turkey.8 p.
@conference{55223a7f39cf45faada249d72d55a0f7,
title = "Document-level machine translation with word vector models",
abstract = "In this paper we apply distributional semantic information to document-level machine translation. We train monolingual and bilingual word vector models on large corpora and we evaluate them first in a cross-lingual lexical substitution task and then on the final translation task. For translation, we incorporate the semantic information in a statistical document-level decoder (Docent), by enforcing translation choices that are semantically similar to the context. As expected, the bilingual word vector models are more appropriate for the purpose of translation. The final document-level translator incorporating the semantic model outperforms the basic Docent (without semantics) and also performs slightly over a standard sentence-level SMT system in terms of ULC (the average of a set of standard automatic evaluation metrics for MT). Finally, we also present some manual analysis of the translations of some concrete documents.",
author = "Garcia, {Eva Mart{\'i}nez} and Cristina Espa{\~n}a-Bonet and Llu{\'i}s M{\`a}rquez",
year = "2015",
month = "1",
day = "1",
language = "English",
pages = "59--66",
note = "18th Annual Conference of the European Association for Machine Translation, EAMT 2015 ; Conference date: 11-05-2015 Through 13-05-2015",

}

TY - CONF

T1 - Document-level machine translation with word vector models

AU - Garcia, Eva Martínez

AU - España-Bonet, Cristina

AU - Màrquez, Lluís

PY - 2015/1/1

Y1 - 2015/1/1

N2 - In this paper we apply distributional semantic information to document-level machine translation. We train monolingual and bilingual word vector models on large corpora and we evaluate them first in a cross-lingual lexical substitution task and then on the final translation task. For translation, we incorporate the semantic information in a statistical document-level decoder (Docent), by enforcing translation choices that are semantically similar to the context. As expected, the bilingual word vector models are more appropriate for the purpose of translation. The final document-level translator incorporating the semantic model outperforms the basic Docent (without semantics) and also performs slightly over a standard sentence-level SMT system in terms of ULC (the average of a set of standard automatic evaluation metrics for MT). Finally, we also present some manual analysis of the translations of some concrete documents.

AB - In this paper we apply distributional semantic information to document-level machine translation. We train monolingual and bilingual word vector models on large corpora and we evaluate them first in a cross-lingual lexical substitution task and then on the final translation task. For translation, we incorporate the semantic information in a statistical document-level decoder (Docent), by enforcing translation choices that are semantically similar to the context. As expected, the bilingual word vector models are more appropriate for the purpose of translation. The final document-level translator incorporating the semantic model outperforms the basic Docent (without semantics) and also performs slightly over a standard sentence-level SMT system in terms of ULC (the average of a set of standard automatic evaluation metrics for MT). Finally, we also present some manual analysis of the translations of some concrete documents.

UR - http://www.scopus.com/inward/record.url?scp=85001126442&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85001126442&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85001126442

SP - 59

EP - 66

ER -