Adaptive parallel sentences mining from web bilingual news collection

Bing Zhao, Stephan Vogel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

50 Citations (Scopus)

Abstract

In this paper a robust, adaptive approach for mining parallel sentences from a bilingual comparable news collection is described. Sentence length models and lexicon-based models are combined under a maximum likelihood criterion. Specific models are proposed to handle insertions and deletions that are frequent in bilingual data collected from the web. The proposed approach is adaptive, updating fhe iranslation lexicon iteratively using the mined parallel data to get better vocabulary coverage and translation probability parameter estimation. Experiments are carried out on 10 years of Xinhua bilingual news collection. Using the mined data, we get significant improvement in word-to-word alignment accuracy in mnchine translation modeling.

Original languageEnglish
Title of host publicationProceedings - IEEE International Conference on Data Mining, ICDM
Pages745-748
Number of pages4
Publication statusPublished - 1 Dec 2002
Externally publishedYes
Event2nd IEEE International Conference on Data Mining, ICDM '02 - Maebashi, Japan
Duration: 9 Dec 200212 Dec 2002

Other

Other2nd IEEE International Conference on Data Mining, ICDM '02
CountryJapan
CityMaebashi
Period9/12/0212/12/02

Fingerprint

Parameter estimation
Maximum likelihood
Experiments

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Zhao, B., & Vogel, S. (2002). Adaptive parallel sentences mining from web bilingual news collection. In Proceedings - IEEE International Conference on Data Mining, ICDM (pp. 745-748)

Adaptive parallel sentences mining from web bilingual news collection. / Zhao, Bing; Vogel, Stephan.

Proceedings - IEEE International Conference on Data Mining, ICDM. 2002. p. 745-748.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhao, B & Vogel, S 2002, Adaptive parallel sentences mining from web bilingual news collection. in Proceedings - IEEE International Conference on Data Mining, ICDM. pp. 745-748, 2nd IEEE International Conference on Data Mining, ICDM '02, Maebashi, Japan, 9/12/02.
Zhao B, Vogel S. Adaptive parallel sentences mining from web bilingual news collection. In Proceedings - IEEE International Conference on Data Mining, ICDM. 2002. p. 745-748
Zhao, Bing ; Vogel, Stephan. / Adaptive parallel sentences mining from web bilingual news collection. Proceedings - IEEE International Conference on Data Mining, ICDM. 2002. pp. 745-748
@inproceedings{0919e58babc946219dc221d47f5f4f11,
title = "Adaptive parallel sentences mining from web bilingual news collection",
abstract = "In this paper a robust, adaptive approach for mining parallel sentences from a bilingual comparable news collection is described. Sentence length models and lexicon-based models are combined under a maximum likelihood criterion. Specific models are proposed to handle insertions and deletions that are frequent in bilingual data collected from the web. The proposed approach is adaptive, updating fhe iranslation lexicon iteratively using the mined parallel data to get better vocabulary coverage and translation probability parameter estimation. Experiments are carried out on 10 years of Xinhua bilingual news collection. Using the mined data, we get significant improvement in word-to-word alignment accuracy in mnchine translation modeling.",
author = "Bing Zhao and Stephan Vogel",
year = "2002",
month = "12",
day = "1",
language = "English",
isbn = "0769517544",
pages = "745--748",
booktitle = "Proceedings - IEEE International Conference on Data Mining, ICDM",

}

TY - GEN

T1 - Adaptive parallel sentences mining from web bilingual news collection

AU - Zhao, Bing

AU - Vogel, Stephan

PY - 2002/12/1

Y1 - 2002/12/1

N2 - In this paper a robust, adaptive approach for mining parallel sentences from a bilingual comparable news collection is described. Sentence length models and lexicon-based models are combined under a maximum likelihood criterion. Specific models are proposed to handle insertions and deletions that are frequent in bilingual data collected from the web. The proposed approach is adaptive, updating fhe iranslation lexicon iteratively using the mined parallel data to get better vocabulary coverage and translation probability parameter estimation. Experiments are carried out on 10 years of Xinhua bilingual news collection. Using the mined data, we get significant improvement in word-to-word alignment accuracy in mnchine translation modeling.

AB - In this paper a robust, adaptive approach for mining parallel sentences from a bilingual comparable news collection is described. Sentence length models and lexicon-based models are combined under a maximum likelihood criterion. Specific models are proposed to handle insertions and deletions that are frequent in bilingual data collected from the web. The proposed approach is adaptive, updating fhe iranslation lexicon iteratively using the mined parallel data to get better vocabulary coverage and translation probability parameter estimation. Experiments are carried out on 10 years of Xinhua bilingual news collection. Using the mined data, we get significant improvement in word-to-word alignment accuracy in mnchine translation modeling.

UR - http://www.scopus.com/inward/record.url?scp=1542596543&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1542596543&partnerID=8YFLogxK

M3 - Conference contribution

SN - 0769517544

SN - 9780769517544

SP - 745

EP - 748

BT - Proceedings - IEEE International Conference on Data Mining, ICDM

ER -