Re-ranking models based-on small training data for Spoken language understanding

Marco Dinarelli, Alessandro Moschitti, Giuseppe Riccardi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

The design of practical language applications by means of statistical approaches requires annotated data, which is one of the most critical constraint. This is particularly true for Spoken Dialog Systems since considerably domain-specific conceptual annotation is needed to obtain accurate Language Understanding models. Since data annotation is usually costly, methods to reduce the amount of data are needed. In this paper, we show that better feature representations serve the above purpose and that structure kernels provide the needed improved representation. Given the relatively high computational cost of kernel methods, we apply them to just re-rank the list of hypotheses provided by a fast generative model. Experiments with Support Vector Machines and different kernels on two different dialog corpora show that our re-ranking models can achieve better results than state-of-the-art approaches when small data is available.

Original languageEnglish
Title of host publicationEMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009
Pages1076-1085
Number of pages10
Publication statusPublished - 1 Dec 2009
Externally publishedYes
Event2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, Held in Conjunction with ACL-IJCNLP 2009 - Singapore, Singapore
Duration: 6 Aug 20097 Aug 2009

Other

Other2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, Held in Conjunction with ACL-IJCNLP 2009
CountrySingapore
CitySingapore
Period6/8/097/8/09

Fingerprint

Support vector machines
Costs
Experiments

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Cite this

Dinarelli, M., Moschitti, A., & Riccardi, G. (2009). Re-ranking models based-on small training data for Spoken language understanding. In EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 (pp. 1076-1085)

Re-ranking models based-on small training data for Spoken language understanding. / Dinarelli, Marco; Moschitti, Alessandro; Riccardi, Giuseppe.

EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009. 2009. p. 1076-1085.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dinarelli, M, Moschitti, A & Riccardi, G 2009, Re-ranking models based-on small training data for Spoken language understanding. in EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009. pp. 1076-1085, 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, Held in Conjunction with ACL-IJCNLP 2009, Singapore, Singapore, 6/8/09.
Dinarelli M, Moschitti A, Riccardi G. Re-ranking models based-on small training data for Spoken language understanding. In EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009. 2009. p. 1076-1085
Dinarelli, Marco ; Moschitti, Alessandro ; Riccardi, Giuseppe. / Re-ranking models based-on small training data for Spoken language understanding. EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009. 2009. pp. 1076-1085
@inproceedings{8da3bd96a7db4a67ac362380c5f86a0e,
title = "Re-ranking models based-on small training data for Spoken language understanding",
abstract = "The design of practical language applications by means of statistical approaches requires annotated data, which is one of the most critical constraint. This is particularly true for Spoken Dialog Systems since considerably domain-specific conceptual annotation is needed to obtain accurate Language Understanding models. Since data annotation is usually costly, methods to reduce the amount of data are needed. In this paper, we show that better feature representations serve the above purpose and that structure kernels provide the needed improved representation. Given the relatively high computational cost of kernel methods, we apply them to just re-rank the list of hypotheses provided by a fast generative model. Experiments with Support Vector Machines and different kernels on two different dialog corpora show that our re-ranking models can achieve better results than state-of-the-art approaches when small data is available.",
author = "Marco Dinarelli and Alessandro Moschitti and Giuseppe Riccardi",
year = "2009",
month = "12",
day = "1",
language = "English",
pages = "1076--1085",
booktitle = "EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009",

}

TY - GEN

T1 - Re-ranking models based-on small training data for Spoken language understanding

AU - Dinarelli, Marco

AU - Moschitti, Alessandro

AU - Riccardi, Giuseppe

PY - 2009/12/1

Y1 - 2009/12/1

N2 - The design of practical language applications by means of statistical approaches requires annotated data, which is one of the most critical constraint. This is particularly true for Spoken Dialog Systems since considerably domain-specific conceptual annotation is needed to obtain accurate Language Understanding models. Since data annotation is usually costly, methods to reduce the amount of data are needed. In this paper, we show that better feature representations serve the above purpose and that structure kernels provide the needed improved representation. Given the relatively high computational cost of kernel methods, we apply them to just re-rank the list of hypotheses provided by a fast generative model. Experiments with Support Vector Machines and different kernels on two different dialog corpora show that our re-ranking models can achieve better results than state-of-the-art approaches when small data is available.

AB - The design of practical language applications by means of statistical approaches requires annotated data, which is one of the most critical constraint. This is particularly true for Spoken Dialog Systems since considerably domain-specific conceptual annotation is needed to obtain accurate Language Understanding models. Since data annotation is usually costly, methods to reduce the amount of data are needed. In this paper, we show that better feature representations serve the above purpose and that structure kernels provide the needed improved representation. Given the relatively high computational cost of kernel methods, we apply them to just re-rank the list of hypotheses provided by a fast generative model. Experiments with Support Vector Machines and different kernels on two different dialog corpora show that our re-ranking models can achieve better results than state-of-the-art approaches when small data is available.

UR - http://www.scopus.com/inward/record.url?scp=77955442707&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955442707&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:77955442707

SP - 1076

EP - 1085

BT - EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009

ER -