Improved word alignments using the Web as a corpus

Preslav Nakov, Svetlin Nakov, Elena Paskaleva

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

We propose a novel method for improving word alignments in a parallel sentence-aligned bilingual corpus based on the idea that if two words are translations of each other then so should be many words in their local contexts. The idea is formalised using the Web as a corpus, a glossary of known word translations (dynamically augmented from the Web using bootstrapping), the vector space model, linguistically motivated weighted minimum edit distance, competitive linking, and the IBM models. Evaluation results on a Bulgarian-Russian corpus show a sizable improvement both in word alignment and in translation quality.

Original languageEnglish
Title of host publicationInternational Conference Recent Advances in Natural Language Processing, RANLP
PublisherAssociation for Computational Linguistics (ACL)
Pages400-405
Number of pages6
Volume2007-January
ISBN (Print)9789549174373
Publication statusPublished - 2007
Externally publishedYes
EventInternational Conference Recent Advances in Natural Language Processing, RANLP 2007 - Borovets, Bulgaria
Duration: 27 Sep 200729 Sep 2007

Other

OtherInternational Conference Recent Advances in Natural Language Processing, RANLP 2007
CountryBulgaria
CityBorovets
Period27/9/0729/9/07

Fingerprint

Vector spaces
Glossaries

Keywords

  • Competitive linking
  • Edit distance
  • Machine translation
  • String similarity
  • Web as a corpus
  • Word alignments

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Software
  • Electrical and Electronic Engineering

Cite this

Nakov, P., Nakov, S., & Paskaleva, E. (2007). Improved word alignments using the Web as a corpus. In International Conference Recent Advances in Natural Language Processing, RANLP (Vol. 2007-January, pp. 400-405). Association for Computational Linguistics (ACL).

Improved word alignments using the Web as a corpus. / Nakov, Preslav; Nakov, Svetlin; Paskaleva, Elena.

International Conference Recent Advances in Natural Language Processing, RANLP. Vol. 2007-January Association for Computational Linguistics (ACL), 2007. p. 400-405.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nakov, P, Nakov, S & Paskaleva, E 2007, Improved word alignments using the Web as a corpus. in International Conference Recent Advances in Natural Language Processing, RANLP. vol. 2007-January, Association for Computational Linguistics (ACL), pp. 400-405, International Conference Recent Advances in Natural Language Processing, RANLP 2007, Borovets, Bulgaria, 27/9/07.
Nakov P, Nakov S, Paskaleva E. Improved word alignments using the Web as a corpus. In International Conference Recent Advances in Natural Language Processing, RANLP. Vol. 2007-January. Association for Computational Linguistics (ACL). 2007. p. 400-405
Nakov, Preslav ; Nakov, Svetlin ; Paskaleva, Elena. / Improved word alignments using the Web as a corpus. International Conference Recent Advances in Natural Language Processing, RANLP. Vol. 2007-January Association for Computational Linguistics (ACL), 2007. pp. 400-405
@inproceedings{47271595d7b0449ab5a0fccf338fda6f,
title = "Improved word alignments using the Web as a corpus",
abstract = "We propose a novel method for improving word alignments in a parallel sentence-aligned bilingual corpus based on the idea that if two words are translations of each other then so should be many words in their local contexts. The idea is formalised using the Web as a corpus, a glossary of known word translations (dynamically augmented from the Web using bootstrapping), the vector space model, linguistically motivated weighted minimum edit distance, competitive linking, and the IBM models. Evaluation results on a Bulgarian-Russian corpus show a sizable improvement both in word alignment and in translation quality.",
keywords = "Competitive linking, Edit distance, Machine translation, String similarity, Web as a corpus, Word alignments",
author = "Preslav Nakov and Svetlin Nakov and Elena Paskaleva",
year = "2007",
language = "English",
isbn = "9789549174373",
volume = "2007-January",
pages = "400--405",
booktitle = "International Conference Recent Advances in Natural Language Processing, RANLP",
publisher = "Association for Computational Linguistics (ACL)",

}

TY - GEN

T1 - Improved word alignments using the Web as a corpus

AU - Nakov, Preslav

AU - Nakov, Svetlin

AU - Paskaleva, Elena

PY - 2007

Y1 - 2007

N2 - We propose a novel method for improving word alignments in a parallel sentence-aligned bilingual corpus based on the idea that if two words are translations of each other then so should be many words in their local contexts. The idea is formalised using the Web as a corpus, a glossary of known word translations (dynamically augmented from the Web using bootstrapping), the vector space model, linguistically motivated weighted minimum edit distance, competitive linking, and the IBM models. Evaluation results on a Bulgarian-Russian corpus show a sizable improvement both in word alignment and in translation quality.

AB - We propose a novel method for improving word alignments in a parallel sentence-aligned bilingual corpus based on the idea that if two words are translations of each other then so should be many words in their local contexts. The idea is formalised using the Web as a corpus, a glossary of known word translations (dynamically augmented from the Web using bootstrapping), the vector space model, linguistically motivated weighted minimum edit distance, competitive linking, and the IBM models. Evaluation results on a Bulgarian-Russian corpus show a sizable improvement both in word alignment and in translation quality.

KW - Competitive linking

KW - Edit distance

KW - Machine translation

KW - String similarity

KW - Web as a corpus

KW - Word alignments

UR - http://www.scopus.com/inward/record.url?scp=84866864359&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866864359&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84866864359

SN - 9789549174373

VL - 2007-January

SP - 400

EP - 405

BT - International Conference Recent Advances in Natural Language Processing, RANLP

PB - Association for Computational Linguistics (ACL)

ER -