Integrated phrase segmentation and alignment algorithm for statistical machine translation

Ying Zhang, Stephan Vogel, Alex Waibel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

27 Citations (Scopus)

Abstract

We present an integrated phrase segmentation/alignment algorithm (ISA) for Statistical Machine Translation. Without the need of building an initial word-to-word alignment or initially segmenting the monolingual text into phrases as other methods do, this algorithm segments the sentences into phrases and finds their alignments simultaneously. For each sentence pair, ISA builds a two-dimensional matrix to represent a sentence pair where the value of each cell corresponds to the Point-wise Mutual Information (MI) between the source and target words. Based on the similarities of MI values among cells, we identify the aligned phrase pairs. Once all the phrase pairs are found, we know both how to segment one sentence into phrases and also the alignments between the source and target sentences. We use monolingual bigram language models to estimate the joint probabilities of the identified phrase pairs. The joint probabilities are then normalized to conditional probabilities, which are used by the decoder. Despite its simplicity, this approach yields phrase-to-phrase translations with significant higher precisions than our baseline system where phrase translations are extracted from the HMM word alignment. When we combine the phrase-to-phrase translations generated by this algorithm with the baseline system, the improvement on translation quality is even larger.

Original languageEnglish
Title of host publicationNLP-KE 2003 - 2003 International Conference on Natural Language Processing and Knowledge Engineering, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages567-573
Number of pages7
ISBN (Print)0780379020, 9780780379022
DOIs
Publication statusPublished - 2003
Externally publishedYes
EventInternational Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2003 - Beijing, China
Duration: 26 Oct 200329 Oct 2003

Other

OtherInternational Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2003
CountryChina
CityBeijing
Period26/10/0329/10/03

Keywords

  • Phrase alignment
  • Phrase segmentation
  • Statistical machine translation

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Software

Cite this

Zhang, Y., Vogel, S., & Waibel, A. (2003). Integrated phrase segmentation and alignment algorithm for statistical machine translation. In NLP-KE 2003 - 2003 International Conference on Natural Language Processing and Knowledge Engineering, Proceedings (pp. 567-573). [1275970] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/NLPKE.2003.1275970

Integrated phrase segmentation and alignment algorithm for statistical machine translation. / Zhang, Ying; Vogel, Stephan; Waibel, Alex.

NLP-KE 2003 - 2003 International Conference on Natural Language Processing and Knowledge Engineering, Proceedings. Institute of Electrical and Electronics Engineers Inc., 2003. p. 567-573 1275970.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhang, Y, Vogel, S & Waibel, A 2003, Integrated phrase segmentation and alignment algorithm for statistical machine translation. in NLP-KE 2003 - 2003 International Conference on Natural Language Processing and Knowledge Engineering, Proceedings., 1275970, Institute of Electrical and Electronics Engineers Inc., pp. 567-573, International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2003, Beijing, China, 26/10/03. https://doi.org/10.1109/NLPKE.2003.1275970
Zhang Y, Vogel S, Waibel A. Integrated phrase segmentation and alignment algorithm for statistical machine translation. In NLP-KE 2003 - 2003 International Conference on Natural Language Processing and Knowledge Engineering, Proceedings. Institute of Electrical and Electronics Engineers Inc. 2003. p. 567-573. 1275970 https://doi.org/10.1109/NLPKE.2003.1275970
Zhang, Ying ; Vogel, Stephan ; Waibel, Alex. / Integrated phrase segmentation and alignment algorithm for statistical machine translation. NLP-KE 2003 - 2003 International Conference on Natural Language Processing and Knowledge Engineering, Proceedings. Institute of Electrical and Electronics Engineers Inc., 2003. pp. 567-573
@inproceedings{6d2ddc1420bb4c6aadaa02cb3efd07f8,
title = "Integrated phrase segmentation and alignment algorithm for statistical machine translation",
abstract = "We present an integrated phrase segmentation/alignment algorithm (ISA) for Statistical Machine Translation. Without the need of building an initial word-to-word alignment or initially segmenting the monolingual text into phrases as other methods do, this algorithm segments the sentences into phrases and finds their alignments simultaneously. For each sentence pair, ISA builds a two-dimensional matrix to represent a sentence pair where the value of each cell corresponds to the Point-wise Mutual Information (MI) between the source and target words. Based on the similarities of MI values among cells, we identify the aligned phrase pairs. Once all the phrase pairs are found, we know both how to segment one sentence into phrases and also the alignments between the source and target sentences. We use monolingual bigram language models to estimate the joint probabilities of the identified phrase pairs. The joint probabilities are then normalized to conditional probabilities, which are used by the decoder. Despite its simplicity, this approach yields phrase-to-phrase translations with significant higher precisions than our baseline system where phrase translations are extracted from the HMM word alignment. When we combine the phrase-to-phrase translations generated by this algorithm with the baseline system, the improvement on translation quality is even larger.",
keywords = "Phrase alignment, Phrase segmentation, Statistical machine translation",
author = "Ying Zhang and Stephan Vogel and Alex Waibel",
year = "2003",
doi = "10.1109/NLPKE.2003.1275970",
language = "English",
isbn = "0780379020",
pages = "567--573",
booktitle = "NLP-KE 2003 - 2003 International Conference on Natural Language Processing and Knowledge Engineering, Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Integrated phrase segmentation and alignment algorithm for statistical machine translation

AU - Zhang, Ying

AU - Vogel, Stephan

AU - Waibel, Alex

PY - 2003

Y1 - 2003

N2 - We present an integrated phrase segmentation/alignment algorithm (ISA) for Statistical Machine Translation. Without the need of building an initial word-to-word alignment or initially segmenting the monolingual text into phrases as other methods do, this algorithm segments the sentences into phrases and finds their alignments simultaneously. For each sentence pair, ISA builds a two-dimensional matrix to represent a sentence pair where the value of each cell corresponds to the Point-wise Mutual Information (MI) between the source and target words. Based on the similarities of MI values among cells, we identify the aligned phrase pairs. Once all the phrase pairs are found, we know both how to segment one sentence into phrases and also the alignments between the source and target sentences. We use monolingual bigram language models to estimate the joint probabilities of the identified phrase pairs. The joint probabilities are then normalized to conditional probabilities, which are used by the decoder. Despite its simplicity, this approach yields phrase-to-phrase translations with significant higher precisions than our baseline system where phrase translations are extracted from the HMM word alignment. When we combine the phrase-to-phrase translations generated by this algorithm with the baseline system, the improvement on translation quality is even larger.

AB - We present an integrated phrase segmentation/alignment algorithm (ISA) for Statistical Machine Translation. Without the need of building an initial word-to-word alignment or initially segmenting the monolingual text into phrases as other methods do, this algorithm segments the sentences into phrases and finds their alignments simultaneously. For each sentence pair, ISA builds a two-dimensional matrix to represent a sentence pair where the value of each cell corresponds to the Point-wise Mutual Information (MI) between the source and target words. Based on the similarities of MI values among cells, we identify the aligned phrase pairs. Once all the phrase pairs are found, we know both how to segment one sentence into phrases and also the alignments between the source and target sentences. We use monolingual bigram language models to estimate the joint probabilities of the identified phrase pairs. The joint probabilities are then normalized to conditional probabilities, which are used by the decoder. Despite its simplicity, this approach yields phrase-to-phrase translations with significant higher precisions than our baseline system where phrase translations are extracted from the HMM word alignment. When we combine the phrase-to-phrase translations generated by this algorithm with the baseline system, the improvement on translation quality is even larger.

KW - Phrase alignment

KW - Phrase segmentation

KW - Statistical machine translation

UR - http://www.scopus.com/inward/record.url?scp=84945115561&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84945115561&partnerID=8YFLogxK

U2 - 10.1109/NLPKE.2003.1275970

DO - 10.1109/NLPKE.2003.1275970

M3 - Conference contribution

SN - 0780379020

SN - 9780780379022

SP - 567

EP - 573

BT - NLP-KE 2003 - 2003 International Conference on Natural Language Processing and Knowledge Engineering, Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -