Spatial coding-based approach for partitioning big spatial data in Hadoop

Xiaochuang Yao, Mohamed Mokbel, Louai Alarabi, Ahmed Eldawy, Jianyu Yang, Wenju Yun, Lin Li, Sijing Ye, Dehai Zhu

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

Spatial data partitioning (SDP) plays a powerful role in distributed storage and parallel computing for spatial data. However, due to skew distribution of spatial data and varying volume of spatial vector objects, it leads to a significant challenge to ensure both optimal performance of spatial operation and data balance in the cluster. To tackle this problem, we proposed a spatial coding-based approach for partitioning big spatial data in Hadoop. This approach, firstly, compressed the whole big spatial data based on spatial coding matrix to create a sensing information set (SIS), including spatial code, size, count and other information. SIS was then employed to build spatial partitioning matrix, which was used to spilt all spatial objects into different partitions in the cluster finally. Based on our approach, the neighbouring spatial objects can be partitioned into the same block. At the same time, it also can minimize the data skew in Hadoop distributed file system (HDFS). The presented approach with a case study in this paper is compared against random sampling based partitioning, with three measurement standards, namely, the spatial index quality, data skew in HDFS, and range query performance. The experimental results show that our method based on spatial coding technique can improve the query performance of big spatial data, as well as the data balance in HDFS. We implemented and deployed this approach in Hadoop, and it is also able to support efficiently any other distributed big spatial data systems.

Original languageEnglish
Pages (from-to)60-67
Number of pages8
JournalComputers and Geosciences
Volume106
DOIs
Publication statusPublished - 1 Sep 2017
Externally publishedYes

    Fingerprint

Keywords

  • Big spatial data
  • Hadoop
  • Spatial coding-based approach
  • Spatial data partitioning

ASJC Scopus subject areas

  • Information Systems
  • Computers in Earth Sciences

Cite this

Yao, X., Mokbel, M., Alarabi, L., Eldawy, A., Yang, J., Yun, W., Li, L., Ye, S., & Zhu, D. (2017). Spatial coding-based approach for partitioning big spatial data in Hadoop. Computers and Geosciences, 106, 60-67. https://doi.org/10.1016/j.cageo.2017.05.014