On Spatial Joins in MapReduce

Ibrahim Sabek, Mohamed Mokbel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This paper provides the first attempt for a full-fledged query optimizer for MapReduce-based spatial join algorithms. The optimizer develops its own taxonomy that covers almost all possible ways of doing a spatial join for any two input datasets. The optimizer comes in two flavors; cost-based and rule-based. Given two input data sets, the cost-based query optimizer evaluates the costs of all possible options in the developed taxonomy, and selects the one with the lowest cost. The rule-based query optimizer abstracts the developed cost models of the cost-based optimizer into a set of simple easy-to-check heuristic rules. Then, it applies its rules to select the lowest cost option. Both query optimizers are deployed and experimentally evaluated inside a widely used open-source MapReduce-based big spatial data system. Exhaustive experiments show that both query optimizers are always successful in taking the right decision for spatially joining any two datasets of up to 500GB each.

Original languageEnglish
Title of host publicationGIS
Subtitle of host publicationProceedings of the ACM International Symposium on Advances in Geographic Information Systems
PublisherAssociation for Computing Machinery
Volume2017-November
ISBN (Print)9781450354905
DOIs
Publication statusPublished - 7 Nov 2017
Event25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2017 - Redondo Beach, United States
Duration: 7 Nov 201710 Nov 2017

Other

Other25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2017
CountryUnited States
CityRedondo Beach
Period7/11/1710/11/17

Fingerprint

MapReduce
Join
Query
cost
Costs
Taxonomy
Taxonomies
Cost Model
Spatial Data
Joining
Open Source
Flavors
heuristics
Heuristics
Cover
spatial data
Evaluate
Experiment

Keywords

  • Hadoop
  • MapReduce
  • Query Optimization
  • Spatial Join

ASJC Scopus subject areas

  • Earth-Surface Processes
  • Computer Science Applications
  • Modelling and Simulation
  • Computer Graphics and Computer-Aided Design
  • Information Systems

Cite this

Sabek, I., & Mokbel, M. (2017). On Spatial Joins in MapReduce. In GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems (Vol. 2017-November). [21] Association for Computing Machinery. https://doi.org/10.1145/3139958.3139967

On Spatial Joins in MapReduce. / Sabek, Ibrahim; Mokbel, Mohamed.

GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems. Vol. 2017-November Association for Computing Machinery, 2017. 21.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sabek, I & Mokbel, M 2017, On Spatial Joins in MapReduce. in GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems. vol. 2017-November, 21, Association for Computing Machinery, 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2017, Redondo Beach, United States, 7/11/17. https://doi.org/10.1145/3139958.3139967
Sabek I, Mokbel M. On Spatial Joins in MapReduce. In GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems. Vol. 2017-November. Association for Computing Machinery. 2017. 21 https://doi.org/10.1145/3139958.3139967
Sabek, Ibrahim ; Mokbel, Mohamed. / On Spatial Joins in MapReduce. GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems. Vol. 2017-November Association for Computing Machinery, 2017.
@inproceedings{2fb37f77beb44b72bfa76e21a33116b2,
title = "On Spatial Joins in MapReduce",
abstract = "This paper provides the first attempt for a full-fledged query optimizer for MapReduce-based spatial join algorithms. The optimizer develops its own taxonomy that covers almost all possible ways of doing a spatial join for any two input datasets. The optimizer comes in two flavors; cost-based and rule-based. Given two input data sets, the cost-based query optimizer evaluates the costs of all possible options in the developed taxonomy, and selects the one with the lowest cost. The rule-based query optimizer abstracts the developed cost models of the cost-based optimizer into a set of simple easy-to-check heuristic rules. Then, it applies its rules to select the lowest cost option. Both query optimizers are deployed and experimentally evaluated inside a widely used open-source MapReduce-based big spatial data system. Exhaustive experiments show that both query optimizers are always successful in taking the right decision for spatially joining any two datasets of up to 500GB each.",
keywords = "Hadoop, MapReduce, Query Optimization, Spatial Join",
author = "Ibrahim Sabek and Mohamed Mokbel",
year = "2017",
month = "11",
day = "7",
doi = "10.1145/3139958.3139967",
language = "English",
isbn = "9781450354905",
volume = "2017-November",
booktitle = "GIS",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - On Spatial Joins in MapReduce

AU - Sabek, Ibrahim

AU - Mokbel, Mohamed

PY - 2017/11/7

Y1 - 2017/11/7

N2 - This paper provides the first attempt for a full-fledged query optimizer for MapReduce-based spatial join algorithms. The optimizer develops its own taxonomy that covers almost all possible ways of doing a spatial join for any two input datasets. The optimizer comes in two flavors; cost-based and rule-based. Given two input data sets, the cost-based query optimizer evaluates the costs of all possible options in the developed taxonomy, and selects the one with the lowest cost. The rule-based query optimizer abstracts the developed cost models of the cost-based optimizer into a set of simple easy-to-check heuristic rules. Then, it applies its rules to select the lowest cost option. Both query optimizers are deployed and experimentally evaluated inside a widely used open-source MapReduce-based big spatial data system. Exhaustive experiments show that both query optimizers are always successful in taking the right decision for spatially joining any two datasets of up to 500GB each.

AB - This paper provides the first attempt for a full-fledged query optimizer for MapReduce-based spatial join algorithms. The optimizer develops its own taxonomy that covers almost all possible ways of doing a spatial join for any two input datasets. The optimizer comes in two flavors; cost-based and rule-based. Given two input data sets, the cost-based query optimizer evaluates the costs of all possible options in the developed taxonomy, and selects the one with the lowest cost. The rule-based query optimizer abstracts the developed cost models of the cost-based optimizer into a set of simple easy-to-check heuristic rules. Then, it applies its rules to select the lowest cost option. Both query optimizers are deployed and experimentally evaluated inside a widely used open-source MapReduce-based big spatial data system. Exhaustive experiments show that both query optimizers are always successful in taking the right decision for spatially joining any two datasets of up to 500GB each.

KW - Hadoop

KW - MapReduce

KW - Query Optimization

KW - Spatial Join

UR - http://www.scopus.com/inward/record.url?scp=85040966160&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85040966160&partnerID=8YFLogxK

U2 - 10.1145/3139958.3139967

DO - 10.1145/3139958.3139967

M3 - Conference contribution

SN - 9781450354905

VL - 2017-November

BT - GIS

PB - Association for Computing Machinery

ER -