Comparing stochastic approaches to spoken language understanding in multiple languages

Stefan Hahn, Marco Dinarelli, Christian Raymond, Fabrice Lefèvre, Patrick Lehnen, Renato De Mori, Alessandro Moschitti, Hermann Ney, Giuseppe Riccardi

Research output: Contribution to journalArticle

56 Citations (Scopus)

Abstract

One of the first steps in building a spoken language understanding (SLU) module for dialogue systems is the extraction of flat concepts out of a given word sequence, usually provided by an automatic speech recognition (ASR) system. In this paper, six different modeling approaches are investigated to tackle the task of concept tagging. These methods include classical, well-known generative and discriminative methods like Finite State Transducers (FSTs), Statistical Machine Translation (SMT), Maximum Entropy Markov Models (MEMMs), or Support Vector Machines (SVMs) as well as techniques recently applied to natural language processing such as Conditional Random Fields (CRFs) or Dynamic Bayesian Networks (DBNs). Following a detailed description of the models, experimental and comparative results are presented on three corpora in different languages and with different complexity. The French MEDIA corpus has already been exploited during an evaluation campaign and so a direct comparison with existing benchmarks is possible. Recently collected Italian and Polish corpora are used to test the robustness and portability of the modeling approaches. For all tasks, manual transcriptions as well as ASR inputs are considered. Additionally to single systems, methods for system combination are investigated. The best performing model on all tasks is based on conditional random fields. On the MEDIA evaluation corpus, a concept error rate of 12.6% could be achieved. Here, additionally to attribute names, attribute values have been extracted using a combination of a rule-based and a statistical approach. Applying system combination using weighted ROVER with all six systems, the concept error rate (CER) drops to 12.0%.

Original languageEnglish
Article number5639034
Pages (from-to)1569-1583
Number of pages15
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume19
Issue number6
DOIs
Publication statusPublished - 3 Jun 2011
Externally publishedYes

Fingerprint

speech recognition
Speech recognition
natural language processing
machine translation
evaluation
Bayesian networks
Transcription
marking
Support vector machines
Transducers
transducers
Entropy
modules
entropy
Processing

Keywords

  • Generative and discriminative models
  • spoken dialogue systems
  • system combination

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Acoustics and Ultrasonics

Cite this

Hahn, S., Dinarelli, M., Raymond, C., Lefèvre, F., Lehnen, P., De Mori, R., ... Riccardi, G. (2011). Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Transactions on Audio, Speech and Language Processing, 19(6), 1569-1583. [5639034]. https://doi.org/10.1109/TASL.2010.2093520

Comparing stochastic approaches to spoken language understanding in multiple languages. / Hahn, Stefan; Dinarelli, Marco; Raymond, Christian; Lefèvre, Fabrice; Lehnen, Patrick; De Mori, Renato; Moschitti, Alessandro; Ney, Hermann; Riccardi, Giuseppe.

In: IEEE Transactions on Audio, Speech and Language Processing, Vol. 19, No. 6, 5639034, 03.06.2011, p. 1569-1583.

Research output: Contribution to journalArticle

Hahn, S, Dinarelli, M, Raymond, C, Lefèvre, F, Lehnen, P, De Mori, R, Moschitti, A, Ney, H & Riccardi, G 2011, 'Comparing stochastic approaches to spoken language understanding in multiple languages', IEEE Transactions on Audio, Speech and Language Processing, vol. 19, no. 6, 5639034, pp. 1569-1583. https://doi.org/10.1109/TASL.2010.2093520
Hahn, Stefan ; Dinarelli, Marco ; Raymond, Christian ; Lefèvre, Fabrice ; Lehnen, Patrick ; De Mori, Renato ; Moschitti, Alessandro ; Ney, Hermann ; Riccardi, Giuseppe. / Comparing stochastic approaches to spoken language understanding in multiple languages. In: IEEE Transactions on Audio, Speech and Language Processing. 2011 ; Vol. 19, No. 6. pp. 1569-1583.
@article{dff8c573ab16405db9872c2766775af9,
title = "Comparing stochastic approaches to spoken language understanding in multiple languages",
abstract = "One of the first steps in building a spoken language understanding (SLU) module for dialogue systems is the extraction of flat concepts out of a given word sequence, usually provided by an automatic speech recognition (ASR) system. In this paper, six different modeling approaches are investigated to tackle the task of concept tagging. These methods include classical, well-known generative and discriminative methods like Finite State Transducers (FSTs), Statistical Machine Translation (SMT), Maximum Entropy Markov Models (MEMMs), or Support Vector Machines (SVMs) as well as techniques recently applied to natural language processing such as Conditional Random Fields (CRFs) or Dynamic Bayesian Networks (DBNs). Following a detailed description of the models, experimental and comparative results are presented on three corpora in different languages and with different complexity. The French MEDIA corpus has already been exploited during an evaluation campaign and so a direct comparison with existing benchmarks is possible. Recently collected Italian and Polish corpora are used to test the robustness and portability of the modeling approaches. For all tasks, manual transcriptions as well as ASR inputs are considered. Additionally to single systems, methods for system combination are investigated. The best performing model on all tasks is based on conditional random fields. On the MEDIA evaluation corpus, a concept error rate of 12.6{\%} could be achieved. Here, additionally to attribute names, attribute values have been extracted using a combination of a rule-based and a statistical approach. Applying system combination using weighted ROVER with all six systems, the concept error rate (CER) drops to 12.0{\%}.",
keywords = "Generative and discriminative models, spoken dialogue systems, system combination",
author = "Stefan Hahn and Marco Dinarelli and Christian Raymond and Fabrice Lef{\`e}vre and Patrick Lehnen and {De Mori}, Renato and Alessandro Moschitti and Hermann Ney and Giuseppe Riccardi",
year = "2011",
month = "6",
day = "3",
doi = "10.1109/TASL.2010.2093520",
language = "English",
volume = "19",
pages = "1569--1583",
journal = "IEEE Transactions on Audio, Speech and Language Processing",
issn = "1558-7916",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "6",

}

TY - JOUR

T1 - Comparing stochastic approaches to spoken language understanding in multiple languages

AU - Hahn, Stefan

AU - Dinarelli, Marco

AU - Raymond, Christian

AU - Lefèvre, Fabrice

AU - Lehnen, Patrick

AU - De Mori, Renato

AU - Moschitti, Alessandro

AU - Ney, Hermann

AU - Riccardi, Giuseppe

PY - 2011/6/3

Y1 - 2011/6/3

N2 - One of the first steps in building a spoken language understanding (SLU) module for dialogue systems is the extraction of flat concepts out of a given word sequence, usually provided by an automatic speech recognition (ASR) system. In this paper, six different modeling approaches are investigated to tackle the task of concept tagging. These methods include classical, well-known generative and discriminative methods like Finite State Transducers (FSTs), Statistical Machine Translation (SMT), Maximum Entropy Markov Models (MEMMs), or Support Vector Machines (SVMs) as well as techniques recently applied to natural language processing such as Conditional Random Fields (CRFs) or Dynamic Bayesian Networks (DBNs). Following a detailed description of the models, experimental and comparative results are presented on three corpora in different languages and with different complexity. The French MEDIA corpus has already been exploited during an evaluation campaign and so a direct comparison with existing benchmarks is possible. Recently collected Italian and Polish corpora are used to test the robustness and portability of the modeling approaches. For all tasks, manual transcriptions as well as ASR inputs are considered. Additionally to single systems, methods for system combination are investigated. The best performing model on all tasks is based on conditional random fields. On the MEDIA evaluation corpus, a concept error rate of 12.6% could be achieved. Here, additionally to attribute names, attribute values have been extracted using a combination of a rule-based and a statistical approach. Applying system combination using weighted ROVER with all six systems, the concept error rate (CER) drops to 12.0%.

AB - One of the first steps in building a spoken language understanding (SLU) module for dialogue systems is the extraction of flat concepts out of a given word sequence, usually provided by an automatic speech recognition (ASR) system. In this paper, six different modeling approaches are investigated to tackle the task of concept tagging. These methods include classical, well-known generative and discriminative methods like Finite State Transducers (FSTs), Statistical Machine Translation (SMT), Maximum Entropy Markov Models (MEMMs), or Support Vector Machines (SVMs) as well as techniques recently applied to natural language processing such as Conditional Random Fields (CRFs) or Dynamic Bayesian Networks (DBNs). Following a detailed description of the models, experimental and comparative results are presented on three corpora in different languages and with different complexity. The French MEDIA corpus has already been exploited during an evaluation campaign and so a direct comparison with existing benchmarks is possible. Recently collected Italian and Polish corpora are used to test the robustness and portability of the modeling approaches. For all tasks, manual transcriptions as well as ASR inputs are considered. Additionally to single systems, methods for system combination are investigated. The best performing model on all tasks is based on conditional random fields. On the MEDIA evaluation corpus, a concept error rate of 12.6% could be achieved. Here, additionally to attribute names, attribute values have been extracted using a combination of a rule-based and a statistical approach. Applying system combination using weighted ROVER with all six systems, the concept error rate (CER) drops to 12.0%.

KW - Generative and discriminative models

KW - spoken dialogue systems

KW - system combination

UR - http://www.scopus.com/inward/record.url?scp=79957695367&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79957695367&partnerID=8YFLogxK

U2 - 10.1109/TASL.2010.2093520

DO - 10.1109/TASL.2010.2093520

M3 - Article

AN - SCOPUS:79957695367

VL - 19

SP - 1569

EP - 1583

JO - IEEE Transactions on Audio, Speech and Language Processing

JF - IEEE Transactions on Audio, Speech and Language Processing

SN - 1558-7916

IS - 6

M1 - 5639034

ER -