Stream decoding for simultaneous spoken language translation

Muntsin Kolss, Stephan Vogel, Alex Waibel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

In the typical speech translation system, the first-best speech recognizer hypothesis is segmented into sentence-like units which are then fed to the downstream machine translation component. The need for a sufficiently large context in this intermediate step and for the MT introduces delays which are undesirable in many application scenarios, such as real-time subtitling of foreign language broadcasts or simultaneous translation of speeches and lectures. In this paper, we propose a statistical machine translation decoder which processes a continuous input stream, such as that produced by a run-on speech recognizer. By decoupling decisions about the timing of translation output generation from any fixed input segmentation, this design can guarantee a maximum output lag for each input word while allowing for full word reordering within this time window. Experimental results show that this system achieves competitive translation performance with a minimum of translation-induced latency.

Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Pages2735-2738
Number of pages4
Publication statusPublished - 2008
Externally publishedYes
EventINTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association - Brisbane, QLD, Australia
Duration: 22 Sep 200826 Sep 2008

Other

OtherINTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association
CountryAustralia
CityBrisbane, QLD
Period22/9/0826/9/08

Fingerprint

Decoding
Language

Keywords

  • Decoding
  • Latency
  • Machine translation
  • Real-time
  • Speech translation

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Sensory Systems

Cite this

Kolss, M., Vogel, S., & Waibel, A. (2008). Stream decoding for simultaneous spoken language translation. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 2735-2738)

Stream decoding for simultaneous spoken language translation. / Kolss, Muntsin; Vogel, Stephan; Waibel, Alex.

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2008. p. 2735-2738.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kolss, M, Vogel, S & Waibel, A 2008, Stream decoding for simultaneous spoken language translation. in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. pp. 2735-2738, INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association, Brisbane, QLD, Australia, 22/9/08.
Kolss M, Vogel S, Waibel A. Stream decoding for simultaneous spoken language translation. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2008. p. 2735-2738
Kolss, Muntsin ; Vogel, Stephan ; Waibel, Alex. / Stream decoding for simultaneous spoken language translation. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2008. pp. 2735-2738
@inproceedings{4b9e629c9c8d4a89a958c4e75b1e5ffa,
title = "Stream decoding for simultaneous spoken language translation",
abstract = "In the typical speech translation system, the first-best speech recognizer hypothesis is segmented into sentence-like units which are then fed to the downstream machine translation component. The need for a sufficiently large context in this intermediate step and for the MT introduces delays which are undesirable in many application scenarios, such as real-time subtitling of foreign language broadcasts or simultaneous translation of speeches and lectures. In this paper, we propose a statistical machine translation decoder which processes a continuous input stream, such as that produced by a run-on speech recognizer. By decoupling decisions about the timing of translation output generation from any fixed input segmentation, this design can guarantee a maximum output lag for each input word while allowing for full word reordering within this time window. Experimental results show that this system achieves competitive translation performance with a minimum of translation-induced latency.",
keywords = "Decoding, Latency, Machine translation, Real-time, Speech translation",
author = "Muntsin Kolss and Stephan Vogel and Alex Waibel",
year = "2008",
language = "English",
pages = "2735--2738",
booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

}

TY - GEN

T1 - Stream decoding for simultaneous spoken language translation

AU - Kolss, Muntsin

AU - Vogel, Stephan

AU - Waibel, Alex

PY - 2008

Y1 - 2008

N2 - In the typical speech translation system, the first-best speech recognizer hypothesis is segmented into sentence-like units which are then fed to the downstream machine translation component. The need for a sufficiently large context in this intermediate step and for the MT introduces delays which are undesirable in many application scenarios, such as real-time subtitling of foreign language broadcasts or simultaneous translation of speeches and lectures. In this paper, we propose a statistical machine translation decoder which processes a continuous input stream, such as that produced by a run-on speech recognizer. By decoupling decisions about the timing of translation output generation from any fixed input segmentation, this design can guarantee a maximum output lag for each input word while allowing for full word reordering within this time window. Experimental results show that this system achieves competitive translation performance with a minimum of translation-induced latency.

AB - In the typical speech translation system, the first-best speech recognizer hypothesis is segmented into sentence-like units which are then fed to the downstream machine translation component. The need for a sufficiently large context in this intermediate step and for the MT introduces delays which are undesirable in many application scenarios, such as real-time subtitling of foreign language broadcasts or simultaneous translation of speeches and lectures. In this paper, we propose a statistical machine translation decoder which processes a continuous input stream, such as that produced by a run-on speech recognizer. By decoupling decisions about the timing of translation output generation from any fixed input segmentation, this design can guarantee a maximum output lag for each input word while allowing for full word reordering within this time window. Experimental results show that this system achieves competitive translation performance with a minimum of translation-induced latency.

KW - Decoding

KW - Latency

KW - Machine translation

KW - Real-time

KW - Speech translation

UR - http://www.scopus.com/inward/record.url?scp=84867209194&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867209194&partnerID=8YFLogxK

M3 - Conference contribution

SP - 2735

EP - 2738

BT - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

ER -