Unsupervised extraction of false friends from parallel bi-texts using the Web as a corpus

Svetlin Nakov, Preslav Nakov, Elena Paskaleva

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

False friends are pairs of words in two languages that are perceived as similar, but have different meanings, e.g., Gift in German means poison in English. In this paper, we present several unsupervised algorithms for acquiring such pairs from a sentence-aligned bi-text. First, we try different ways of exploiting simple statistics about monolingual word occurrences and cross-lingual word co-occurrences in the bi-text. Second, using methods from statistical machine translation, we induce word alignments in an unsupervised way, from which we estimate lexical translation probabilities, which we use to measure cross-lingual semantic similarity. Third, we experiment with a semantic similarity measure that uses the Web as a corpus to extract local contexts from text snippets returned by a search engine, and a bilingual glossary of known word translation pairs, used as "bridges". Finally, all measures are combined and applied to the task of identifying likely false friends. The evaluation for Russian and Bulgarian shows a significant improvement over previously-proposed algorithms.

Original languageEnglish
Title of host publicationInternational Conference Recent Advances in Natural Language Processing, RANLP
Pages292-298
Number of pages7
Publication statusPublished - 2009
Externally publishedYes
EventInternational Conference on Recent Advances in Natural Language Processing, RANLP-2009 - Borovets, Bulgaria
Duration: 14 Sep 200916 Sep 2009

Other

OtherInternational Conference on Recent Advances in Natural Language Processing, RANLP-2009
CountryBulgaria
CityBorovets
Period14/9/0916/9/09

    Fingerprint

Keywords

  • Cognates
  • Cross-lingual semantic similarity
  • False friends
  • Statistical machine translation
  • Web as a corpus

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Software
  • Electrical and Electronic Engineering

Cite this

Nakov, S., Nakov, P., & Paskaleva, E. (2009). Unsupervised extraction of false friends from parallel bi-texts using the Web as a corpus. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 292-298)