Cross-lingual lexical language discovery from audio data using multiple translations

F. Stahlberg, T. Schlippe, Stephan Vogel, T. Schultz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Zero-resource Automatic Speech Recognition (ZR ASR) addresses target languages without given pronunciation dictionary, transcribed speech, and language model. Lexical discovery for ZR ASR aims to extract word-like chunks from speech. Lexical discovery benefits from the availability of written translations in another source language [1, 2, 3]. In this paper, we improve lexical discovery even more by combining multiple source languages. We present a novel method for combining noisy word segmentations resulting in up to 11.2% relative F-score gain. When we extract word pronunciations from the combined segmentations to bootstrap an ASR system, we improve accuracy by 9.1% relative compared to the best system with only one translation, and by 50.1% compared to monolingual lexical discovery.

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5823-5827
Number of pages5
Volume2015-August
ISBN (Print)9781467369978
DOIs
Publication statusPublished - 4 Aug 2015
Event40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Brisbane, Australia
Duration: 19 Apr 201424 Apr 2014

Other

Other40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015
CountryAustralia
CityBrisbane
Period19/4/1424/4/14

Fingerprint

Speech recognition
Glossaries
Availability

Keywords

  • Lexical language discovery
  • non-written languages
  • word-to-phoneme alignment
  • zero-resource automatic speech recognition

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Stahlberg, F., Schlippe, T., Vogel, S., & Schultz, T. (2015). Cross-lingual lexical language discovery from audio data using multiple translations. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 2015-August, pp. 5823-5827). [7179088] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2015.7179088

Cross-lingual lexical language discovery from audio data using multiple translations. / Stahlberg, F.; Schlippe, T.; Vogel, Stephan; Schultz, T.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 2015-August Institute of Electrical and Electronics Engineers Inc., 2015. p. 5823-5827 7179088.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Stahlberg, F, Schlippe, T, Vogel, S & Schultz, T 2015, Cross-lingual lexical language discovery from audio data using multiple translations. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. vol. 2015-August, 7179088, Institute of Electrical and Electronics Engineers Inc., pp. 5823-5827, 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015, Brisbane, Australia, 19/4/14. https://doi.org/10.1109/ICASSP.2015.7179088
Stahlberg F, Schlippe T, Vogel S, Schultz T. Cross-lingual lexical language discovery from audio data using multiple translations. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 2015-August. Institute of Electrical and Electronics Engineers Inc. 2015. p. 5823-5827. 7179088 https://doi.org/10.1109/ICASSP.2015.7179088
Stahlberg, F. ; Schlippe, T. ; Vogel, Stephan ; Schultz, T. / Cross-lingual lexical language discovery from audio data using multiple translations. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 2015-August Institute of Electrical and Electronics Engineers Inc., 2015. pp. 5823-5827
@inproceedings{f1e9994f6b7a43ea853762ed34540141,
title = "Cross-lingual lexical language discovery from audio data using multiple translations",
abstract = "Zero-resource Automatic Speech Recognition (ZR ASR) addresses target languages without given pronunciation dictionary, transcribed speech, and language model. Lexical discovery for ZR ASR aims to extract word-like chunks from speech. Lexical discovery benefits from the availability of written translations in another source language [1, 2, 3]. In this paper, we improve lexical discovery even more by combining multiple source languages. We present a novel method for combining noisy word segmentations resulting in up to 11.2{\%} relative F-score gain. When we extract word pronunciations from the combined segmentations to bootstrap an ASR system, we improve accuracy by 9.1{\%} relative compared to the best system with only one translation, and by 50.1{\%} compared to monolingual lexical discovery.",
keywords = "Lexical language discovery, non-written languages, word-to-phoneme alignment, zero-resource automatic speech recognition",
author = "F. Stahlberg and T. Schlippe and Stephan Vogel and T. Schultz",
year = "2015",
month = "8",
day = "4",
doi = "10.1109/ICASSP.2015.7179088",
language = "English",
isbn = "9781467369978",
volume = "2015-August",
pages = "5823--5827",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Cross-lingual lexical language discovery from audio data using multiple translations

AU - Stahlberg, F.

AU - Schlippe, T.

AU - Vogel, Stephan

AU - Schultz, T.

PY - 2015/8/4

Y1 - 2015/8/4

N2 - Zero-resource Automatic Speech Recognition (ZR ASR) addresses target languages without given pronunciation dictionary, transcribed speech, and language model. Lexical discovery for ZR ASR aims to extract word-like chunks from speech. Lexical discovery benefits from the availability of written translations in another source language [1, 2, 3]. In this paper, we improve lexical discovery even more by combining multiple source languages. We present a novel method for combining noisy word segmentations resulting in up to 11.2% relative F-score gain. When we extract word pronunciations from the combined segmentations to bootstrap an ASR system, we improve accuracy by 9.1% relative compared to the best system with only one translation, and by 50.1% compared to monolingual lexical discovery.

AB - Zero-resource Automatic Speech Recognition (ZR ASR) addresses target languages without given pronunciation dictionary, transcribed speech, and language model. Lexical discovery for ZR ASR aims to extract word-like chunks from speech. Lexical discovery benefits from the availability of written translations in another source language [1, 2, 3]. In this paper, we improve lexical discovery even more by combining multiple source languages. We present a novel method for combining noisy word segmentations resulting in up to 11.2% relative F-score gain. When we extract word pronunciations from the combined segmentations to bootstrap an ASR system, we improve accuracy by 9.1% relative compared to the best system with only one translation, and by 50.1% compared to monolingual lexical discovery.

KW - Lexical language discovery

KW - non-written languages

KW - word-to-phoneme alignment

KW - zero-resource automatic speech recognition

UR - http://www.scopus.com/inward/record.url?scp=84946057599&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84946057599&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2015.7179088

DO - 10.1109/ICASSP.2015.7179088

M3 - Conference contribution

SN - 9781467369978

VL - 2015-August

SP - 5823

EP - 5827

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -