Automatic Generation and Reranking of SQL-Derived Answers to NL Questions

Alessandra Giordani, Alessandro Moschitti

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In this paper, given a relational database, we automatically translate a natural language question into an SQL query retrieving the correct answer. We exploit the structure of the DB to generate a set of candidate SQL queries, which we rerank with a SVM-ranker based on tree kernels. In particular we use linguistic dependencies in the natural language question and the DB metadata to build a set of plausible SELECT, WHERE and FROM clauses enriched with meaningful joins. Then, we combine all the clauses to get the set of all possible SQL queries, producing candidate queries to answer the question. This approach can be recursively applied to deal with complex questions, requiring nested queries. We sort the candidates in terms of scores of correctness using a weighting scheme applied to the query generation rules. Then, we use a SVM ranker trained with structural kernels to reorder the list of question and query pairs, where both members are represented as syntactic trees. The f-measure of our model on standard benchmarks is in line with the best models (85% on the first question), which use external and expensive hand-crafted resources such as the semantic interpretation. Moreover, we can provide a set of candidate answers with a Recall of the answer of about 92% and 96% on the first 2 and 5 candidates, respectively.

Original languageEnglish
Title of host publicationCommunications in Computer and Information Science
PublisherSpringer Verlag
Pages59-76
Number of pages18
Volume379 CCIS
ISBN (Print)9783642452598
DOIs
Publication statusPublished - 1 Jan 2013
Externally publishedYes
Event2nd International Workshop on Trustworthy Eternal Systems via Evolving Software, Data and Knowledge, EternalS 2012 - Montpellier, France
Duration: 28 Aug 201228 Aug 2012

Publication series

NameCommunications in Computer and Information Science
Volume379 CCIS
ISSN (Print)18650929

Other

Other2nd International Workshop on Trustworthy Eternal Systems via Evolving Software, Data and Knowledge, EternalS 2012
CountryFrance
CityMontpellier
Period28/8/1228/8/12

Fingerprint

Syntactics
Metadata
Linguistics
Semantics

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Giordani, A., & Moschitti, A. (2013). Automatic Generation and Reranking of SQL-Derived Answers to NL Questions. In Communications in Computer and Information Science (Vol. 379 CCIS, pp. 59-76). (Communications in Computer and Information Science; Vol. 379 CCIS). Springer Verlag. https://doi.org/10.1007/978-3-642-45260-4_5

Automatic Generation and Reranking of SQL-Derived Answers to NL Questions. / Giordani, Alessandra; Moschitti, Alessandro.

Communications in Computer and Information Science. Vol. 379 CCIS Springer Verlag, 2013. p. 59-76 (Communications in Computer and Information Science; Vol. 379 CCIS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Giordani, A & Moschitti, A 2013, Automatic Generation and Reranking of SQL-Derived Answers to NL Questions. in Communications in Computer and Information Science. vol. 379 CCIS, Communications in Computer and Information Science, vol. 379 CCIS, Springer Verlag, pp. 59-76, 2nd International Workshop on Trustworthy Eternal Systems via Evolving Software, Data and Knowledge, EternalS 2012, Montpellier, France, 28/8/12. https://doi.org/10.1007/978-3-642-45260-4_5
Giordani A, Moschitti A. Automatic Generation and Reranking of SQL-Derived Answers to NL Questions. In Communications in Computer and Information Science. Vol. 379 CCIS. Springer Verlag. 2013. p. 59-76. (Communications in Computer and Information Science). https://doi.org/10.1007/978-3-642-45260-4_5
Giordani, Alessandra ; Moschitti, Alessandro. / Automatic Generation and Reranking of SQL-Derived Answers to NL Questions. Communications in Computer and Information Science. Vol. 379 CCIS Springer Verlag, 2013. pp. 59-76 (Communications in Computer and Information Science).
@inproceedings{dcb6b837d30c4fe09103e7e7bcf16a93,
title = "Automatic Generation and Reranking of SQL-Derived Answers to NL Questions",
abstract = "In this paper, given a relational database, we automatically translate a natural language question into an SQL query retrieving the correct answer. We exploit the structure of the DB to generate a set of candidate SQL queries, which we rerank with a SVM-ranker based on tree kernels. In particular we use linguistic dependencies in the natural language question and the DB metadata to build a set of plausible SELECT, WHERE and FROM clauses enriched with meaningful joins. Then, we combine all the clauses to get the set of all possible SQL queries, producing candidate queries to answer the question. This approach can be recursively applied to deal with complex questions, requiring nested queries. We sort the candidates in terms of scores of correctness using a weighting scheme applied to the query generation rules. Then, we use a SVM ranker trained with structural kernels to reorder the list of question and query pairs, where both members are represented as syntactic trees. The f-measure of our model on standard benchmarks is in line with the best models (85{\%} on the first question), which use external and expensive hand-crafted resources such as the semantic interpretation. Moreover, we can provide a set of candidate answers with a Recall of the answer of about 92{\%} and 96{\%} on the first 2 and 5 candidates, respectively.",
author = "Alessandra Giordani and Alessandro Moschitti",
year = "2013",
month = "1",
day = "1",
doi = "10.1007/978-3-642-45260-4_5",
language = "English",
isbn = "9783642452598",
volume = "379 CCIS",
series = "Communications in Computer and Information Science",
publisher = "Springer Verlag",
pages = "59--76",
booktitle = "Communications in Computer and Information Science",

}

TY - GEN

T1 - Automatic Generation and Reranking of SQL-Derived Answers to NL Questions

AU - Giordani, Alessandra

AU - Moschitti, Alessandro

PY - 2013/1/1

Y1 - 2013/1/1

N2 - In this paper, given a relational database, we automatically translate a natural language question into an SQL query retrieving the correct answer. We exploit the structure of the DB to generate a set of candidate SQL queries, which we rerank with a SVM-ranker based on tree kernels. In particular we use linguistic dependencies in the natural language question and the DB metadata to build a set of plausible SELECT, WHERE and FROM clauses enriched with meaningful joins. Then, we combine all the clauses to get the set of all possible SQL queries, producing candidate queries to answer the question. This approach can be recursively applied to deal with complex questions, requiring nested queries. We sort the candidates in terms of scores of correctness using a weighting scheme applied to the query generation rules. Then, we use a SVM ranker trained with structural kernels to reorder the list of question and query pairs, where both members are represented as syntactic trees. The f-measure of our model on standard benchmarks is in line with the best models (85% on the first question), which use external and expensive hand-crafted resources such as the semantic interpretation. Moreover, we can provide a set of candidate answers with a Recall of the answer of about 92% and 96% on the first 2 and 5 candidates, respectively.

AB - In this paper, given a relational database, we automatically translate a natural language question into an SQL query retrieving the correct answer. We exploit the structure of the DB to generate a set of candidate SQL queries, which we rerank with a SVM-ranker based on tree kernels. In particular we use linguistic dependencies in the natural language question and the DB metadata to build a set of plausible SELECT, WHERE and FROM clauses enriched with meaningful joins. Then, we combine all the clauses to get the set of all possible SQL queries, producing candidate queries to answer the question. This approach can be recursively applied to deal with complex questions, requiring nested queries. We sort the candidates in terms of scores of correctness using a weighting scheme applied to the query generation rules. Then, we use a SVM ranker trained with structural kernels to reorder the list of question and query pairs, where both members are represented as syntactic trees. The f-measure of our model on standard benchmarks is in line with the best models (85% on the first question), which use external and expensive hand-crafted resources such as the semantic interpretation. Moreover, we can provide a set of candidate answers with a Recall of the answer of about 92% and 96% on the first 2 and 5 candidates, respectively.

UR - http://www.scopus.com/inward/record.url?scp=84904636807&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84904636807&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-45260-4_5

DO - 10.1007/978-3-642-45260-4_5

M3 - Conference contribution

SN - 9783642452598

VL - 379 CCIS

T3 - Communications in Computer and Information Science

SP - 59

EP - 76

BT - Communications in Computer and Information Science

PB - Springer Verlag

ER -