Sentence segmentation and punctuation recovery for spoken language translation

Matthias Paulik, Sharath Rao, Ian Lane, Stephan Vogel, Tanja Schultz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

Sentence segmentation and punctuation recovery are critical components for effective spoken language translation (SLT). In this paper we describe our recent work on sentence segmentation and punctuation recovery for three different language pairs, namely for English-to-Spanish, Arabic-to-English and Chinese-to-English. We show that the proposed approach works equally well in these very different language pairs. Furthermore, we introduce two features computed from the translation beam-search lattice that indicate if phrasal and target language model context is jeopardized when segmenting at a given word boundary. These features enable us to introduce short intra-sentence segments without degrading translation performance.

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Pages5105-5108
Number of pages4
DOIs
Publication statusPublished - 16 Sep 2008
Externally publishedYes
Event2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP - Las Vegas, NV, United States
Duration: 31 Mar 20084 Apr 2008

Other

Other2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
CountryUnited States
CityLas Vegas, NV
Period31/3/084/4/08

Fingerprint

sentences
recovery
Recovery

Keywords

  • Punctuation Recovery
  • Sentence Segmentation
  • Spoken Language Translation
  • Tight Coupling

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing
  • Acoustics and Ultrasonics

Cite this

Paulik, M., Rao, S., Lane, I., Vogel, S., & Schultz, T. (2008). Sentence segmentation and punctuation recovery for spoken language translation. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (pp. 5105-5108). [4518807] https://doi.org/10.1109/ICASSP.2008.4518807

Sentence segmentation and punctuation recovery for spoken language translation. / Paulik, Matthias; Rao, Sharath; Lane, Ian; Vogel, Stephan; Schultz, Tanja.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2008. p. 5105-5108 4518807.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Paulik, M, Rao, S, Lane, I, Vogel, S & Schultz, T 2008, Sentence segmentation and punctuation recovery for spoken language translation. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings., 4518807, pp. 5105-5108, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Las Vegas, NV, United States, 31/3/08. https://doi.org/10.1109/ICASSP.2008.4518807
Paulik M, Rao S, Lane I, Vogel S, Schultz T. Sentence segmentation and punctuation recovery for spoken language translation. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2008. p. 5105-5108. 4518807 https://doi.org/10.1109/ICASSP.2008.4518807
Paulik, Matthias ; Rao, Sharath ; Lane, Ian ; Vogel, Stephan ; Schultz, Tanja. / Sentence segmentation and punctuation recovery for spoken language translation. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2008. pp. 5105-5108
@inproceedings{40ea795558d244909ac444f19e281c83,
title = "Sentence segmentation and punctuation recovery for spoken language translation",
abstract = "Sentence segmentation and punctuation recovery are critical components for effective spoken language translation (SLT). In this paper we describe our recent work on sentence segmentation and punctuation recovery for three different language pairs, namely for English-to-Spanish, Arabic-to-English and Chinese-to-English. We show that the proposed approach works equally well in these very different language pairs. Furthermore, we introduce two features computed from the translation beam-search lattice that indicate if phrasal and target language model context is jeopardized when segmenting at a given word boundary. These features enable us to introduce short intra-sentence segments without degrading translation performance.",
keywords = "Punctuation Recovery, Sentence Segmentation, Spoken Language Translation, Tight Coupling",
author = "Matthias Paulik and Sharath Rao and Ian Lane and Stephan Vogel and Tanja Schultz",
year = "2008",
month = "9",
day = "16",
doi = "10.1109/ICASSP.2008.4518807",
language = "English",
isbn = "1424414849",
pages = "5105--5108",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

}

TY - GEN

T1 - Sentence segmentation and punctuation recovery for spoken language translation

AU - Paulik, Matthias

AU - Rao, Sharath

AU - Lane, Ian

AU - Vogel, Stephan

AU - Schultz, Tanja

PY - 2008/9/16

Y1 - 2008/9/16

N2 - Sentence segmentation and punctuation recovery are critical components for effective spoken language translation (SLT). In this paper we describe our recent work on sentence segmentation and punctuation recovery for three different language pairs, namely for English-to-Spanish, Arabic-to-English and Chinese-to-English. We show that the proposed approach works equally well in these very different language pairs. Furthermore, we introduce two features computed from the translation beam-search lattice that indicate if phrasal and target language model context is jeopardized when segmenting at a given word boundary. These features enable us to introduce short intra-sentence segments without degrading translation performance.

AB - Sentence segmentation and punctuation recovery are critical components for effective spoken language translation (SLT). In this paper we describe our recent work on sentence segmentation and punctuation recovery for three different language pairs, namely for English-to-Spanish, Arabic-to-English and Chinese-to-English. We show that the proposed approach works equally well in these very different language pairs. Furthermore, we introduce two features computed from the translation beam-search lattice that indicate if phrasal and target language model context is jeopardized when segmenting at a given word boundary. These features enable us to introduce short intra-sentence segments without degrading translation performance.

KW - Punctuation Recovery

KW - Sentence Segmentation

KW - Spoken Language Translation

KW - Tight Coupling

UR - http://www.scopus.com/inward/record.url?scp=51449090378&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=51449090378&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2008.4518807

DO - 10.1109/ICASSP.2008.4518807

M3 - Conference contribution

AN - SCOPUS:51449090378

SN - 1424414849

SN - 9781424414840

SP - 5105

EP - 5108

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

ER -