Sphinx

Empowering impala for efficient execution of SQL queries on big spatial data

Ahmed Eldawy, Ibrahim Sabek, Mostafa Elganainy, Ammar Bakeer, Ahmed Abdelmotaleb, Mohamed Mokbel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents Sphinx, a full-fledged open-source system for big spatial data which overcomes the limitations of existing systems by adopting a standard SQL interface, and by providing a high efficient core built inside the core of the Apache Impala system. Sphinx is composed of four main layers, namely, query parser, indexer, query planner, and query executor. The query parser injects spatial data types and functions in the SQL interface of Sphinx. The indexer creates spatial indexes in Sphinx by adopting a two-layered index design. The query planner utilizes these indexes to construct efficient query plans for range query and spatial join operations. Finally, the query executor carries out these plans on big spatial datasets in a distributed cluster. A system prototype of Sphinx running on real datasets shows up-to three orders of magnitude performance improvement over plain-vanilla Impala, SpatialHadoop, and PostGIS.

Original languageEnglish
Title of host publicationAdvances in Spatial and Temporal Databases - 15th International Symposium, SSTD 2017, Proceedings
PublisherSpringer Verlag
Pages65-83
Number of pages19
ISBN (Print)9783319643663
DOIs
Publication statusPublished - 1 Jan 2017
Externally publishedYes
Event15th International Symposium on Spatial and Temporal Databases, SSTD 2017 - Arlington, United States
Duration: 21 Aug 201723 Aug 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10411 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other15th International Symposium on Spatial and Temporal Databases, SSTD 2017
CountryUnited States
CityArlington
Period21/8/1723/8/17

Fingerprint

Spatial Data
Query
Spatial Index
Range Query
Open Source
Join
Prototype

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Eldawy, A., Sabek, I., Elganainy, M., Bakeer, A., Abdelmotaleb, A., & Mokbel, M. (2017). Sphinx: Empowering impala for efficient execution of SQL queries on big spatial data. In Advances in Spatial and Temporal Databases - 15th International Symposium, SSTD 2017, Proceedings (pp. 65-83). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10411 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-64367-0_4

Sphinx : Empowering impala for efficient execution of SQL queries on big spatial data. / Eldawy, Ahmed; Sabek, Ibrahim; Elganainy, Mostafa; Bakeer, Ammar; Abdelmotaleb, Ahmed; Mokbel, Mohamed.

Advances in Spatial and Temporal Databases - 15th International Symposium, SSTD 2017, Proceedings. Springer Verlag, 2017. p. 65-83 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10411 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Eldawy, A, Sabek, I, Elganainy, M, Bakeer, A, Abdelmotaleb, A & Mokbel, M 2017, Sphinx: Empowering impala for efficient execution of SQL queries on big spatial data. in Advances in Spatial and Temporal Databases - 15th International Symposium, SSTD 2017, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10411 LNCS, Springer Verlag, pp. 65-83, 15th International Symposium on Spatial and Temporal Databases, SSTD 2017, Arlington, United States, 21/8/17. https://doi.org/10.1007/978-3-319-64367-0_4
Eldawy A, Sabek I, Elganainy M, Bakeer A, Abdelmotaleb A, Mokbel M. Sphinx: Empowering impala for efficient execution of SQL queries on big spatial data. In Advances in Spatial and Temporal Databases - 15th International Symposium, SSTD 2017, Proceedings. Springer Verlag. 2017. p. 65-83. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-64367-0_4
Eldawy, Ahmed ; Sabek, Ibrahim ; Elganainy, Mostafa ; Bakeer, Ammar ; Abdelmotaleb, Ahmed ; Mokbel, Mohamed. / Sphinx : Empowering impala for efficient execution of SQL queries on big spatial data. Advances in Spatial and Temporal Databases - 15th International Symposium, SSTD 2017, Proceedings. Springer Verlag, 2017. pp. 65-83 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{831e1d4c591f4619b6b22b0d3c922688,
title = "Sphinx: Empowering impala for efficient execution of SQL queries on big spatial data",
abstract = "This paper presents Sphinx, a full-fledged open-source system for big spatial data which overcomes the limitations of existing systems by adopting a standard SQL interface, and by providing a high efficient core built inside the core of the Apache Impala system. Sphinx is composed of four main layers, namely, query parser, indexer, query planner, and query executor. The query parser injects spatial data types and functions in the SQL interface of Sphinx. The indexer creates spatial indexes in Sphinx by adopting a two-layered index design. The query planner utilizes these indexes to construct efficient query plans for range query and spatial join operations. Finally, the query executor carries out these plans on big spatial datasets in a distributed cluster. A system prototype of Sphinx running on real datasets shows up-to three orders of magnitude performance improvement over plain-vanilla Impala, SpatialHadoop, and PostGIS.",
author = "Ahmed Eldawy and Ibrahim Sabek and Mostafa Elganainy and Ammar Bakeer and Ahmed Abdelmotaleb and Mohamed Mokbel",
year = "2017",
month = "1",
day = "1",
doi = "10.1007/978-3-319-64367-0_4",
language = "English",
isbn = "9783319643663",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "65--83",
booktitle = "Advances in Spatial and Temporal Databases - 15th International Symposium, SSTD 2017, Proceedings",

}

TY - GEN

T1 - Sphinx

T2 - Empowering impala for efficient execution of SQL queries on big spatial data

AU - Eldawy, Ahmed

AU - Sabek, Ibrahim

AU - Elganainy, Mostafa

AU - Bakeer, Ammar

AU - Abdelmotaleb, Ahmed

AU - Mokbel, Mohamed

PY - 2017/1/1

Y1 - 2017/1/1

N2 - This paper presents Sphinx, a full-fledged open-source system for big spatial data which overcomes the limitations of existing systems by adopting a standard SQL interface, and by providing a high efficient core built inside the core of the Apache Impala system. Sphinx is composed of four main layers, namely, query parser, indexer, query planner, and query executor. The query parser injects spatial data types and functions in the SQL interface of Sphinx. The indexer creates spatial indexes in Sphinx by adopting a two-layered index design. The query planner utilizes these indexes to construct efficient query plans for range query and spatial join operations. Finally, the query executor carries out these plans on big spatial datasets in a distributed cluster. A system prototype of Sphinx running on real datasets shows up-to three orders of magnitude performance improvement over plain-vanilla Impala, SpatialHadoop, and PostGIS.

AB - This paper presents Sphinx, a full-fledged open-source system for big spatial data which overcomes the limitations of existing systems by adopting a standard SQL interface, and by providing a high efficient core built inside the core of the Apache Impala system. Sphinx is composed of four main layers, namely, query parser, indexer, query planner, and query executor. The query parser injects spatial data types and functions in the SQL interface of Sphinx. The indexer creates spatial indexes in Sphinx by adopting a two-layered index design. The query planner utilizes these indexes to construct efficient query plans for range query and spatial join operations. Finally, the query executor carries out these plans on big spatial datasets in a distributed cluster. A system prototype of Sphinx running on real datasets shows up-to three orders of magnitude performance improvement over plain-vanilla Impala, SpatialHadoop, and PostGIS.

UR - http://www.scopus.com/inward/record.url?scp=85028467434&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85028467434&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-64367-0_4

DO - 10.1007/978-3-319-64367-0_4

M3 - Conference contribution

SN - 9783319643663

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 65

EP - 83

BT - Advances in Spatial and Temporal Databases - 15th International Symposium, SSTD 2017, Proceedings

PB - Springer Verlag

ER -