Nonparametric word segmentation for machine translation

Thuy Linh Nguyen, Stephan Vogel, Noah A. Smith

Research output: Chapter in Book/Report/Conference proceedingConference contribution

20 Citations (Scopus)

Abstract

We present an unsupervised word segmentation model for machine translation. The model uses existing monolingual segmentation techniques and models the joint distribution over source sentence segmentations and alignments to the target sentence. During inference, the monolingual segmentation model and the bilingual word alignment model are coupled so that the alignments to the target sentence guide the segmentation of the source sentence. The experiments show improvements on Arabic-English and Chinese- English translation tasks.

Original languageEnglish
Title of host publicationColing 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference
Pages815-823
Number of pages9
Volume2
Publication statusPublished - 1 Dec 2010
Externally publishedYes
Event23rd International Conference on Computational Linguistics, Coling 2010 - Beijing, China
Duration: 23 Aug 201027 Aug 2010

Other

Other23rd International Conference on Computational Linguistics, Coling 2010
CountryChina
CityBeijing
Period23/8/1027/8/10

Fingerprint

Word Segmentation
Machine Translation
segmentation
Segmentation
experiment
Alignment
Experiments
Inference
Experiment
English Translation

ASJC Scopus subject areas

  • Language and Linguistics
  • Computational Theory and Mathematics
  • Linguistics and Language

Cite this

Nguyen, T. L., Vogel, S., & Smith, N. A. (2010). Nonparametric word segmentation for machine translation. In Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference (Vol. 2, pp. 815-823)

Nonparametric word segmentation for machine translation. / Nguyen, Thuy Linh; Vogel, Stephan; Smith, Noah A.

Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference. Vol. 2 2010. p. 815-823.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nguyen, TL, Vogel, S & Smith, NA 2010, Nonparametric word segmentation for machine translation. in Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference. vol. 2, pp. 815-823, 23rd International Conference on Computational Linguistics, Coling 2010, Beijing, China, 23/8/10.
Nguyen TL, Vogel S, Smith NA. Nonparametric word segmentation for machine translation. In Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference. Vol. 2. 2010. p. 815-823
Nguyen, Thuy Linh ; Vogel, Stephan ; Smith, Noah A. / Nonparametric word segmentation for machine translation. Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference. Vol. 2 2010. pp. 815-823
@inproceedings{1034a4e358854d969b59895bf8253708,
title = "Nonparametric word segmentation for machine translation",
abstract = "We present an unsupervised word segmentation model for machine translation. The model uses existing monolingual segmentation techniques and models the joint distribution over source sentence segmentations and alignments to the target sentence. During inference, the monolingual segmentation model and the bilingual word alignment model are coupled so that the alignments to the target sentence guide the segmentation of the source sentence. The experiments show improvements on Arabic-English and Chinese- English translation tasks.",
author = "Nguyen, {Thuy Linh} and Stephan Vogel and Smith, {Noah A.}",
year = "2010",
month = "12",
day = "1",
language = "English",
volume = "2",
pages = "815--823",
booktitle = "Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference",

}

TY - GEN

T1 - Nonparametric word segmentation for machine translation

AU - Nguyen, Thuy Linh

AU - Vogel, Stephan

AU - Smith, Noah A.

PY - 2010/12/1

Y1 - 2010/12/1

N2 - We present an unsupervised word segmentation model for machine translation. The model uses existing monolingual segmentation techniques and models the joint distribution over source sentence segmentations and alignments to the target sentence. During inference, the monolingual segmentation model and the bilingual word alignment model are coupled so that the alignments to the target sentence guide the segmentation of the source sentence. The experiments show improvements on Arabic-English and Chinese- English translation tasks.

AB - We present an unsupervised word segmentation model for machine translation. The model uses existing monolingual segmentation techniques and models the joint distribution over source sentence segmentations and alignments to the target sentence. During inference, the monolingual segmentation model and the bilingual word alignment model are coupled so that the alignments to the target sentence guide the segmentation of the source sentence. The experiments show improvements on Arabic-English and Chinese- English translation tasks.

UR - http://www.scopus.com/inward/record.url?scp=80053424373&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053424373&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:80053424373

VL - 2

SP - 815

EP - 823

BT - Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference

ER -