Integrated phrase segmentation and alignment algorithm for statistical machine translation

Ying Zhang, Stephan Vogel, Alex Waibel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

27 Citations (Scopus)

Abstract

We present an integrated phrase segmentation/alignment algorithm (ISA) for Statistical Machine Translation. Without the need of building an initial word-to-word alignment or initially segmenting the monolingual text into phrases as other methods do, this algorithm segments the sentences into phrases and finds their alignments simultaneously. For each sentence pair, ISA builds a two-dimensional matrix to represent a sentence pair where the value of each cell corresponds to the Point-wise Mutual Information (MI) between the source and target words. Based on the similarities of MI values among cells, we identify the aligned phrase pairs. Once all the phrase pairs are found, we know both how to segment one sentence into phrases and also the alignments between the source and target sentences. We use monolingual bigram language models to estimate the joint probabilities of the identified phrase pairs. The joint probabilities are then normalized to conditional probabilities, which are used by the decoder. Despite its simplicity, this approach yields phrase-to-phrase translations with significant higher precisions than our baseline system where phrase translations are extracted from the HMM word alignment. When we combine the phrase-to-phrase translations generated by this algorithm with the baseline system, the improvement on translation quality is even larger.

Original languageEnglish
Title of host publicationNLP-KE 2003 - 2003 International Conference on Natural Language Processing and Knowledge Engineering, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages567-573
Number of pages7
ISBN (Print)0780379020, 9780780379022
DOIs
Publication statusPublished - 2003
Externally publishedYes
EventInternational Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2003 - Beijing, China
Duration: 26 Oct 200329 Oct 2003

Other

OtherInternational Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2003
CountryChina
CityBeijing
Period26/10/0329/10/03

Keywords

  • Phrase alignment
  • Phrase segmentation
  • Statistical machine translation

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Software

Cite this

Zhang, Y., Vogel, S., & Waibel, A. (2003). Integrated phrase segmentation and alignment algorithm for statistical machine translation. In NLP-KE 2003 - 2003 International Conference on Natural Language Processing and Knowledge Engineering, Proceedings (pp. 567-573). [1275970] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/NLPKE.2003.1275970