We face problems in large vocabulary continuous speech recognition for agglutinative languages, due to lack of coverage of all possible words. Since it is not enough to have a finite full-word lexicon, we may use sub-word units for recognition. Using sub-word lexicon units and developing language models based on these units solves the coverage problem. However, this results in increased acoustic confusability and shorter effective language model history length. We introduce new ways to choose lexicon units and we incorporate linguistic constraints into a statistical language model developed with the new units. We represent both the statistical language model and linguistic constraints as weighted finite state machines (WFSM) and combine them to obtain a novel language model. We study the performance of the new language model and show that it achieves 3% relative reduction in word error rate when used in recognizing a test-set of 2151 words.
|Title of host publication||Proceedings of the IEEE 13th Signal Processing and Communications Applications Conference, SIU 2005|
|Number of pages||4|
|Publication status||Published - 2005|
|Event||IEEE 13th Signal Processing and Communications Applications Conference, SIU 2005 - Kayseri, Turkey|
Duration: 16 May 2005 → 18 May 2005
|Period||16/5/05 → 18/5/05|
ASJC Scopus subject areas