Selecting sentences for answering complex questions

Yllias Chali, Shafiq Rayhan Joty

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Complex questions that require inferencing and synthesizing information from multiple documents can be seen as a kind of topicoriented, informative multi-document summarization. In this paper, we have experimented with one empirical and two unsupervised statistical machine learning techniques: k-means and Expectation Maximization (EM), for computing relative importance of the sentences. However, the performance of these approaches depends entirely on the feature set used and the weighting of these features. We extracted different kinds of features (i.e. lexical, lexical semantic, cosine similarity, basic element, tree kernel based syntactic and shallow-semantic) for each of the document sentences in order to measure its importance and relevancy to the user query. We used a local search technique to learn the weights of the features. For all our methods of generating summaries, we have shown the effects of syntactic and shallow-semantic features over the bag of words (BOW) features.

Original languageEnglish
Title of host publicationEMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL
Pages304-313
Number of pages10
Publication statusPublished - 1 Dec 2008
Externally publishedYes
Event2008 Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Co-located with AMTA 2008 and the International Workshop on Spoken Language Translation - Honolulu, HI, United States
Duration: 25 Oct 200827 Oct 2008

Other

Other2008 Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Co-located with AMTA 2008 and the International Workshop on Spoken Language Translation
CountryUnited States
CityHonolulu, HI
Period25/10/0827/10/08

Fingerprint

Semantics
Syntactics
Learning systems

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Cite this

Chali, Y., & Rayhan Joty, S. (2008). Selecting sentences for answering complex questions. In EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL (pp. 304-313)

Selecting sentences for answering complex questions. / Chali, Yllias; Rayhan Joty, Shafiq.

EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL. 2008. p. 304-313.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chali, Y & Rayhan Joty, S 2008, Selecting sentences for answering complex questions. in EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL. pp. 304-313, 2008 Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Co-located with AMTA 2008 and the International Workshop on Spoken Language Translation, Honolulu, HI, United States, 25/10/08.
Chali Y, Rayhan Joty S. Selecting sentences for answering complex questions. In EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL. 2008. p. 304-313
Chali, Yllias ; Rayhan Joty, Shafiq. / Selecting sentences for answering complex questions. EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL. 2008. pp. 304-313
@inproceedings{5fa5b3dea6b640a8b40736607cb20556,
title = "Selecting sentences for answering complex questions",
abstract = "Complex questions that require inferencing and synthesizing information from multiple documents can be seen as a kind of topicoriented, informative multi-document summarization. In this paper, we have experimented with one empirical and two unsupervised statistical machine learning techniques: k-means and Expectation Maximization (EM), for computing relative importance of the sentences. However, the performance of these approaches depends entirely on the feature set used and the weighting of these features. We extracted different kinds of features (i.e. lexical, lexical semantic, cosine similarity, basic element, tree kernel based syntactic and shallow-semantic) for each of the document sentences in order to measure its importance and relevancy to the user query. We used a local search technique to learn the weights of the features. For all our methods of generating summaries, we have shown the effects of syntactic and shallow-semantic features over the bag of words (BOW) features.",
author = "Yllias Chali and {Rayhan Joty}, Shafiq",
year = "2008",
month = "12",
day = "1",
language = "English",
pages = "304--313",
booktitle = "EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL",

}

TY - GEN

T1 - Selecting sentences for answering complex questions

AU - Chali, Yllias

AU - Rayhan Joty, Shafiq

PY - 2008/12/1

Y1 - 2008/12/1

N2 - Complex questions that require inferencing and synthesizing information from multiple documents can be seen as a kind of topicoriented, informative multi-document summarization. In this paper, we have experimented with one empirical and two unsupervised statistical machine learning techniques: k-means and Expectation Maximization (EM), for computing relative importance of the sentences. However, the performance of these approaches depends entirely on the feature set used and the weighting of these features. We extracted different kinds of features (i.e. lexical, lexical semantic, cosine similarity, basic element, tree kernel based syntactic and shallow-semantic) for each of the document sentences in order to measure its importance and relevancy to the user query. We used a local search technique to learn the weights of the features. For all our methods of generating summaries, we have shown the effects of syntactic and shallow-semantic features over the bag of words (BOW) features.

AB - Complex questions that require inferencing and synthesizing information from multiple documents can be seen as a kind of topicoriented, informative multi-document summarization. In this paper, we have experimented with one empirical and two unsupervised statistical machine learning techniques: k-means and Expectation Maximization (EM), for computing relative importance of the sentences. However, the performance of these approaches depends entirely on the feature set used and the weighting of these features. We extracted different kinds of features (i.e. lexical, lexical semantic, cosine similarity, basic element, tree kernel based syntactic and shallow-semantic) for each of the document sentences in order to measure its importance and relevancy to the user query. We used a local search technique to learn the weights of the features. For all our methods of generating summaries, we have shown the effects of syntactic and shallow-semantic features over the bag of words (BOW) features.

UR - http://www.scopus.com/inward/record.url?scp=80053377737&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053377737&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:80053377737

SP - 304

EP - 313

BT - EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL

ER -