Cross-lingual lexical language discovery from audio data using multiple translations

F. Stahlberg, T. Schlippe, Stephan Vogel, T. Schultz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Zero-resource Automatic Speech Recognition (ZR ASR) addresses target languages without given pronunciation dictionary, transcribed speech, and language model. Lexical discovery for ZR ASR aims to extract word-like chunks from speech. Lexical discovery benefits from the availability of written translations in another source language [1, 2, 3]. In this paper, we improve lexical discovery even more by combining multiple source languages. We present a novel method for combining noisy word segmentations resulting in up to 11.2% relative F-score gain. When we extract word pronunciations from the combined segmentations to bootstrap an ASR system, we improve accuracy by 9.1% relative compared to the best system with only one translation, and by 50.1% compared to monolingual lexical discovery.

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5823-5827
Number of pages5
Volume2015-August
ISBN (Print)9781467369978
DOIs
Publication statusPublished - 4 Aug 2015
Event40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Brisbane, Australia
Duration: 19 Apr 201424 Apr 2014

Other

Other40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015
CountryAustralia
CityBrisbane
Period19/4/1424/4/14

    Fingerprint

Keywords

  • Lexical language discovery
  • non-written languages
  • word-to-phoneme alignment
  • zero-resource automatic speech recognition

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Stahlberg, F., Schlippe, T., Vogel, S., & Schultz, T. (2015). Cross-lingual lexical language discovery from audio data using multiple translations. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 2015-August, pp. 5823-5827). [7179088] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2015.7179088