Corpora for automatically learning to map natural language questions into SQL Queries

Alessandra Giordani, Alessandro Moschitti

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

Automatically translating natural language into machine-readable instructions is one of major interesting and challenging tasks in Natural Language (NL) Processing. This problem can be addressed by using machine learning algorithms to generate a function that find mappings between natural language and programming language semantics. For this purpose suitable annotated and structured data are required. In this paper, we describe our method to construct and semi-automatically annotate these kinds of data, consisting of pairs of NL questions and SQL queries. Additionally, we describe two different datasets obtained by applying our annotation method to two well-known corpora, GEOQUERIES and RESTQUERIES. Since we believe that syntactic levels are important, we also generate and make available relational pairs represented by means of their syntactic trees whose lexical content has been generalized. We validate the quality of our corpora by experimenting with them and our machine learning models to derive automatic NL/SQL translators. Our promising results suggest that our corpora can be effectively used to carry out research in the field of natural language interface to database.

Original languageEnglish
Title of host publicationProceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
PublisherEuropean Language Resources Association (ELRA)
Pages2336-2339
Number of pages4
ISBN (Electronic)2951740867, 9782951740860
Publication statusPublished - 1 Jan 2010
Event7th International Conference on Language Resources and Evaluation, LREC 2010 - Valletta, Malta
Duration: 17 May 201023 May 2010

Other

Other7th International Conference on Language Resources and Evaluation, LREC 2010
CountryMalta
CityValletta
Period17/5/1023/5/10

Fingerprint

language
learning
programming language
translator
Natural Language
semantics
instruction
Machine Learning
Syntax
Data Base
Programming Languages
Natural Language Processing
Translating
Annotation
Learning Model
Natural Language Interfaces
Translator

ASJC Scopus subject areas

  • Education
  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics

Cite this

Giordani, A., & Moschitti, A. (2010). Corpora for automatically learning to map natural language questions into SQL Queries. In Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010 (pp. 2336-2339). European Language Resources Association (ELRA).

Corpora for automatically learning to map natural language questions into SQL Queries. / Giordani, Alessandra; Moschitti, Alessandro.

Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA), 2010. p. 2336-2339.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Giordani, A & Moschitti, A 2010, Corpora for automatically learning to map natural language questions into SQL Queries. in Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA), pp. 2336-2339, 7th International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta, 17/5/10.
Giordani A, Moschitti A. Corpora for automatically learning to map natural language questions into SQL Queries. In Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA). 2010. p. 2336-2339
Giordani, Alessandra ; Moschitti, Alessandro. / Corpora for automatically learning to map natural language questions into SQL Queries. Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA), 2010. pp. 2336-2339
@inproceedings{f256b48185024ae9a7aa9ecaa6eea91a,
title = "Corpora for automatically learning to map natural language questions into SQL Queries",
abstract = "Automatically translating natural language into machine-readable instructions is one of major interesting and challenging tasks in Natural Language (NL) Processing. This problem can be addressed by using machine learning algorithms to generate a function that find mappings between natural language and programming language semantics. For this purpose suitable annotated and structured data are required. In this paper, we describe our method to construct and semi-automatically annotate these kinds of data, consisting of pairs of NL questions and SQL queries. Additionally, we describe two different datasets obtained by applying our annotation method to two well-known corpora, GEOQUERIES and RESTQUERIES. Since we believe that syntactic levels are important, we also generate and make available relational pairs represented by means of their syntactic trees whose lexical content has been generalized. We validate the quality of our corpora by experimenting with them and our machine learning models to derive automatic NL/SQL translators. Our promising results suggest that our corpora can be effectively used to carry out research in the field of natural language interface to database.",
author = "Alessandra Giordani and Alessandro Moschitti",
year = "2010",
month = "1",
day = "1",
language = "English",
pages = "2336--2339",
booktitle = "Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010",
publisher = "European Language Resources Association (ELRA)",

}

TY - GEN

T1 - Corpora for automatically learning to map natural language questions into SQL Queries

AU - Giordani, Alessandra

AU - Moschitti, Alessandro

PY - 2010/1/1

Y1 - 2010/1/1

N2 - Automatically translating natural language into machine-readable instructions is one of major interesting and challenging tasks in Natural Language (NL) Processing. This problem can be addressed by using machine learning algorithms to generate a function that find mappings between natural language and programming language semantics. For this purpose suitable annotated and structured data are required. In this paper, we describe our method to construct and semi-automatically annotate these kinds of data, consisting of pairs of NL questions and SQL queries. Additionally, we describe two different datasets obtained by applying our annotation method to two well-known corpora, GEOQUERIES and RESTQUERIES. Since we believe that syntactic levels are important, we also generate and make available relational pairs represented by means of their syntactic trees whose lexical content has been generalized. We validate the quality of our corpora by experimenting with them and our machine learning models to derive automatic NL/SQL translators. Our promising results suggest that our corpora can be effectively used to carry out research in the field of natural language interface to database.

AB - Automatically translating natural language into machine-readable instructions is one of major interesting and challenging tasks in Natural Language (NL) Processing. This problem can be addressed by using machine learning algorithms to generate a function that find mappings between natural language and programming language semantics. For this purpose suitable annotated and structured data are required. In this paper, we describe our method to construct and semi-automatically annotate these kinds of data, consisting of pairs of NL questions and SQL queries. Additionally, we describe two different datasets obtained by applying our annotation method to two well-known corpora, GEOQUERIES and RESTQUERIES. Since we believe that syntactic levels are important, we also generate and make available relational pairs represented by means of their syntactic trees whose lexical content has been generalized. We validate the quality of our corpora by experimenting with them and our machine learning models to derive automatic NL/SQL translators. Our promising results suggest that our corpora can be effectively used to carry out research in the field of natural language interface to database.

UR - http://www.scopus.com/inward/record.url?scp=85015295163&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85015295163&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85015295163

SP - 2336

EP - 2339

BT - Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010

PB - European Language Resources Association (ELRA)

ER -