Incorporating language constraints in sub-word based speech recognition

Hakan Erdoǧan, Osman Büyük, Kemal Oflazer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Citations (Scopus)

Abstract

In large vocabulary continuous speech recognition (LVCSR) for agglutinative and inflectional languages, we encounter problems due to theoretically infinite full-word lexicon size. Sub-word lexicon units may be utilized to dramatically reduce the out-of-vocabulary rate in test data. One can develop language models based on sub-word units to perform LVCSR. However, it has not always been beneficial to use sub-word lexicon units, since shorter units have higher acoustic confusability among them and language model history is effectively shorter as compared to the history in full-word language models. To reduce the aforementioned problems, we propose using the longest possible sub-word units in our lexicon, namely half-words and full-words only. We also incorporate linguistic rules of half-word combination into our statistical language model. The language constraints are represented with a rule-based WFSM which can be combined with an N-gram language model to yield a better and smaller language model. We study the performance of the proposed system for Turkish LVCSR, when the language constraint takes the form of enforcing vowel harmony between stems and endings. We also introduce novel error-rate metrics that are more appropriate than word-error-rate for agglutinative languages. Using half-words with a bi-gram model yields a significant reduction in word-error-rate as compared to a bi-gram full-word model. In addition, combining a tri-gram half-word language model with the vowel-harmony WFSM improves the accuracy further when rescoring the bi-gram lattices.

Original languageEnglish
Title of host publicationProceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop
Pages281-286
Number of pages6
Volume2005
DOIs
Publication statusPublished - 2005
Externally publishedYes
EventASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop - Cancun
Duration: 27 Nov 20051 Dec 2005

Other

OtherASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop
CityCancun
Period27/11/051/12/05

Fingerprint

Speech recognition
Continuous speech recognition
Linguistics
Acoustics

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Erdoǧan, H., Büyük, O., & Oflazer, K. (2005). Incorporating language constraints in sub-word based speech recognition. In Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop (Vol. 2005, pp. 281-286). [1566516] https://doi.org/10.1109/ASRU.2005.1566516

Incorporating language constraints in sub-word based speech recognition. / Erdoǧan, Hakan; Büyük, Osman; Oflazer, Kemal.

Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop. Vol. 2005 2005. p. 281-286 1566516.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Erdoǧan, H, Büyük, O & Oflazer, K 2005, Incorporating language constraints in sub-word based speech recognition. in Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop. vol. 2005, 1566516, pp. 281-286, ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop, Cancun, 27/11/05. https://doi.org/10.1109/ASRU.2005.1566516
Erdoǧan H, Büyük O, Oflazer K. Incorporating language constraints in sub-word based speech recognition. In Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop. Vol. 2005. 2005. p. 281-286. 1566516 https://doi.org/10.1109/ASRU.2005.1566516
Erdoǧan, Hakan ; Büyük, Osman ; Oflazer, Kemal. / Incorporating language constraints in sub-word based speech recognition. Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop. Vol. 2005 2005. pp. 281-286
@inproceedings{303ee8a62cd341f3bdd170a0968d925d,
title = "Incorporating language constraints in sub-word based speech recognition",
abstract = "In large vocabulary continuous speech recognition (LVCSR) for agglutinative and inflectional languages, we encounter problems due to theoretically infinite full-word lexicon size. Sub-word lexicon units may be utilized to dramatically reduce the out-of-vocabulary rate in test data. One can develop language models based on sub-word units to perform LVCSR. However, it has not always been beneficial to use sub-word lexicon units, since shorter units have higher acoustic confusability among them and language model history is effectively shorter as compared to the history in full-word language models. To reduce the aforementioned problems, we propose using the longest possible sub-word units in our lexicon, namely half-words and full-words only. We also incorporate linguistic rules of half-word combination into our statistical language model. The language constraints are represented with a rule-based WFSM which can be combined with an N-gram language model to yield a better and smaller language model. We study the performance of the proposed system for Turkish LVCSR, when the language constraint takes the form of enforcing vowel harmony between stems and endings. We also introduce novel error-rate metrics that are more appropriate than word-error-rate for agglutinative languages. Using half-words with a bi-gram model yields a significant reduction in word-error-rate as compared to a bi-gram full-word model. In addition, combining a tri-gram half-word language model with the vowel-harmony WFSM improves the accuracy further when rescoring the bi-gram lattices.",
author = "Hakan Erdoǧan and Osman B{\"u}y{\"u}k and Kemal Oflazer",
year = "2005",
doi = "10.1109/ASRU.2005.1566516",
language = "English",
isbn = "0780394798",
volume = "2005",
pages = "281--286",
booktitle = "Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop",

}

TY - GEN

T1 - Incorporating language constraints in sub-word based speech recognition

AU - Erdoǧan, Hakan

AU - Büyük, Osman

AU - Oflazer, Kemal

PY - 2005

Y1 - 2005

N2 - In large vocabulary continuous speech recognition (LVCSR) for agglutinative and inflectional languages, we encounter problems due to theoretically infinite full-word lexicon size. Sub-word lexicon units may be utilized to dramatically reduce the out-of-vocabulary rate in test data. One can develop language models based on sub-word units to perform LVCSR. However, it has not always been beneficial to use sub-word lexicon units, since shorter units have higher acoustic confusability among them and language model history is effectively shorter as compared to the history in full-word language models. To reduce the aforementioned problems, we propose using the longest possible sub-word units in our lexicon, namely half-words and full-words only. We also incorporate linguistic rules of half-word combination into our statistical language model. The language constraints are represented with a rule-based WFSM which can be combined with an N-gram language model to yield a better and smaller language model. We study the performance of the proposed system for Turkish LVCSR, when the language constraint takes the form of enforcing vowel harmony between stems and endings. We also introduce novel error-rate metrics that are more appropriate than word-error-rate for agglutinative languages. Using half-words with a bi-gram model yields a significant reduction in word-error-rate as compared to a bi-gram full-word model. In addition, combining a tri-gram half-word language model with the vowel-harmony WFSM improves the accuracy further when rescoring the bi-gram lattices.

AB - In large vocabulary continuous speech recognition (LVCSR) for agglutinative and inflectional languages, we encounter problems due to theoretically infinite full-word lexicon size. Sub-word lexicon units may be utilized to dramatically reduce the out-of-vocabulary rate in test data. One can develop language models based on sub-word units to perform LVCSR. However, it has not always been beneficial to use sub-word lexicon units, since shorter units have higher acoustic confusability among them and language model history is effectively shorter as compared to the history in full-word language models. To reduce the aforementioned problems, we propose using the longest possible sub-word units in our lexicon, namely half-words and full-words only. We also incorporate linguistic rules of half-word combination into our statistical language model. The language constraints are represented with a rule-based WFSM which can be combined with an N-gram language model to yield a better and smaller language model. We study the performance of the proposed system for Turkish LVCSR, when the language constraint takes the form of enforcing vowel harmony between stems and endings. We also introduce novel error-rate metrics that are more appropriate than word-error-rate for agglutinative languages. Using half-words with a bi-gram model yields a significant reduction in word-error-rate as compared to a bi-gram full-word model. In addition, combining a tri-gram half-word language model with the vowel-harmony WFSM improves the accuracy further when rescoring the bi-gram lattices.

UR - http://www.scopus.com/inward/record.url?scp=33846254191&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33846254191&partnerID=8YFLogxK

U2 - 10.1109/ASRU.2005.1566516

DO - 10.1109/ASRU.2005.1566516

M3 - Conference contribution

SN - 0780394798

SN - 9780780394797

VL - 2005

SP - 281

EP - 286

BT - Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop

ER -