Adaptive parallel sentences mining from web bilingual news collection

Bing Zhao, Stephan Vogel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

51 Citations (Scopus)

Abstract

In this paper a robust, adaptive approach for mining parallel sentences from a bilingual comparable news collection is described. Sentence length models and lexicon-based models are combined under a maximum likelihood criterion. Specific models are proposed to handle insertions and deletions that are frequent in bilingual data collected from the web. The proposed approach is adaptive, updating fhe iranslation lexicon iteratively using the mined parallel data to get better vocabulary coverage and translation probability parameter estimation. Experiments are carried out on 10 years of Xinhua bilingual news collection. Using the mined data, we get significant improvement in word-to-word alignment accuracy in mnchine translation modeling.

Original languageEnglish
Title of host publicationProceedings - 2002 IEEE International Conference on Data Mining, ICDM 2002
Pages745-748
Number of pages4
Publication statusPublished - 1 Dec 2002
Event2nd IEEE International Conference on Data Mining, ICDM '02 - Maebashi, Japan
Duration: 9 Dec 200212 Dec 2002

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other2nd IEEE International Conference on Data Mining, ICDM '02
CountryJapan
CityMaebashi
Period9/12/0212/12/02

    Fingerprint

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Zhao, B., & Vogel, S. (2002). Adaptive parallel sentences mining from web bilingual news collection. In Proceedings - 2002 IEEE International Conference on Data Mining, ICDM 2002 (pp. 745-748). (Proceedings - IEEE International Conference on Data Mining, ICDM).