Pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment

Felix Stahlberg, Tim Schlippe, Stephan Vogel, Tanja Schultz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

With the help of written translations in a source language, we cross-lingually segment phoneme sequences in a target language into word units using our new alignment model Model 3P [17]. From this, we deduce phonetic transcriptions of target language words, introduce the vocabulary in terms of word IDs, and extract a pronunciation dictionary. Our approach is highly relevant to bootstrap dictionaries from audio data for Automatic Speech Recognition and bypass the written form in Speech-to-Speech Translation, particularly in the context of under-resourced languages, and those which are not written at all. Analyzing 14 translations in 9 languages to build a dictionary for English shows that the quality of the resulting dictionary is better in case of close vocabulary sizes in source and target language, shorter sentences, more word repetitions, and formal equivalent translations.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages260-272
Number of pages13
Volume7978 LNAI
DOIs
Publication statusPublished - 3 Sep 2013
Event1st International Conference on Statistical Language and Speech Processing, SLSP 2013 - Tarragona, Spain
Duration: 29 Jul 201331 Jul 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7978 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other1st International Conference on Statistical Language and Speech Processing, SLSP 2013
CountrySpain
CityTarragona
Period29/7/1331/7/13

Fingerprint

Glossaries
Alignment
Target
Speech analysis
Transcription
Speech recognition
Automatic Speech Recognition
Bootstrap
Language
Deduce
Unit
Dictionary
Model

Keywords

  • pronunciation dictionary
  • speech-to-speech translation
  • under-resourced languages
  • word segmentation

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Stahlberg, F., Schlippe, T., Vogel, S., & Schultz, T. (2013). Pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7978 LNAI, pp. 260-272). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7978 LNAI). https://doi.org/10.1007/978-3-642-39593-2_23

Pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment. / Stahlberg, Felix; Schlippe, Tim; Vogel, Stephan; Schultz, Tanja.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7978 LNAI 2013. p. 260-272 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7978 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Stahlberg, F, Schlippe, T, Vogel, S & Schultz, T 2013, Pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 7978 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7978 LNAI, pp. 260-272, 1st International Conference on Statistical Language and Speech Processing, SLSP 2013, Tarragona, Spain, 29/7/13. https://doi.org/10.1007/978-3-642-39593-2_23
Stahlberg F, Schlippe T, Vogel S, Schultz T. Pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7978 LNAI. 2013. p. 260-272. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-39593-2_23
Stahlberg, Felix ; Schlippe, Tim ; Vogel, Stephan ; Schultz, Tanja. / Pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7978 LNAI 2013. pp. 260-272 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{8dcf164527fd43d6b3ec00642c22894f,
title = "Pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment",
abstract = "With the help of written translations in a source language, we cross-lingually segment phoneme sequences in a target language into word units using our new alignment model Model 3P [17]. From this, we deduce phonetic transcriptions of target language words, introduce the vocabulary in terms of word IDs, and extract a pronunciation dictionary. Our approach is highly relevant to bootstrap dictionaries from audio data for Automatic Speech Recognition and bypass the written form in Speech-to-Speech Translation, particularly in the context of under-resourced languages, and those which are not written at all. Analyzing 14 translations in 9 languages to build a dictionary for English shows that the quality of the resulting dictionary is better in case of close vocabulary sizes in source and target language, shorter sentences, more word repetitions, and formal equivalent translations.",
keywords = "pronunciation dictionary, speech-to-speech translation, under-resourced languages, word segmentation",
author = "Felix Stahlberg and Tim Schlippe and Stephan Vogel and Tanja Schultz",
year = "2013",
month = "9",
day = "3",
doi = "10.1007/978-3-642-39593-2_23",
language = "English",
isbn = "9783642395925",
volume = "7978 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "260--272",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment

AU - Stahlberg, Felix

AU - Schlippe, Tim

AU - Vogel, Stephan

AU - Schultz, Tanja

PY - 2013/9/3

Y1 - 2013/9/3

N2 - With the help of written translations in a source language, we cross-lingually segment phoneme sequences in a target language into word units using our new alignment model Model 3P [17]. From this, we deduce phonetic transcriptions of target language words, introduce the vocabulary in terms of word IDs, and extract a pronunciation dictionary. Our approach is highly relevant to bootstrap dictionaries from audio data for Automatic Speech Recognition and bypass the written form in Speech-to-Speech Translation, particularly in the context of under-resourced languages, and those which are not written at all. Analyzing 14 translations in 9 languages to build a dictionary for English shows that the quality of the resulting dictionary is better in case of close vocabulary sizes in source and target language, shorter sentences, more word repetitions, and formal equivalent translations.

AB - With the help of written translations in a source language, we cross-lingually segment phoneme sequences in a target language into word units using our new alignment model Model 3P [17]. From this, we deduce phonetic transcriptions of target language words, introduce the vocabulary in terms of word IDs, and extract a pronunciation dictionary. Our approach is highly relevant to bootstrap dictionaries from audio data for Automatic Speech Recognition and bypass the written form in Speech-to-Speech Translation, particularly in the context of under-resourced languages, and those which are not written at all. Analyzing 14 translations in 9 languages to build a dictionary for English shows that the quality of the resulting dictionary is better in case of close vocabulary sizes in source and target language, shorter sentences, more word repetitions, and formal equivalent translations.

KW - pronunciation dictionary

KW - speech-to-speech translation

KW - under-resourced languages

KW - word segmentation

UR - http://www.scopus.com/inward/record.url?scp=84883149008&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84883149008&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-39593-2_23

DO - 10.1007/978-3-642-39593-2_23

M3 - Conference contribution

SN - 9783642395925

VL - 7978 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 260

EP - 272

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -