ST-Hadoop: a MapReduce framework for spatio-temporal data

Louai Alarabi, Mohamed Mokbel, Mashaal Musleh

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

This paper presents ST-Hadoop; the first full-fledged open-source MapReduce framework with a native support for spatio-temporal data. ST-Hadoop is a comprehensive extension to Hadoop and SpatialHadoop that injects spatio-temporal data awareness inside each of their layers, mainly, language, indexing, and operations layers. In the language layer, ST-Hadoop provides built in spatio-temporal data types and operations. In the indexing layer, ST-Hadoop spatiotemporally loads and divides data across computation nodes in Hadoop Distributed File System in a way that mimics spatio-temporal index structures, which result in achieving orders of magnitude better performance than Hadoop and SpatialHadoop when dealing with spatio-temporal data and queries. In the operations layer, ST-Hadoop shipped with support for three fundamental spatio-temporal queries, namely, spatio-temporal range, top-k nearest neighbor, and join queries. Extensibility of ST-Hadoop allows others to extend features and operations easily using similar approaches described in the paper. Extensive experiments conducted on large-scale dataset of size 10 TB that contains over 1 Billion spatio-temporal records, to show that ST-Hadoop achieves orders of magnitude better performance than Hadoop and SpaitalHadoop when dealing with spatio-temporal data and operations. The key idea behind the performance gained in ST-Hadoop is its ability in indexing spatio-temporal data within Hadoop Distributed File System.

Original languageEnglish
Pages (from-to)1-29
Number of pages29
JournalGeoInformatica
DOIs
Publication statusAccepted/In press - 5 Jul 2018
Externally publishedYes

Fingerprint

indexing
Experiments
temporal record
performance
language
experiment
ability

Keywords

  • MapReduce-based systems
  • Spatio-temporal join query
  • Spatio-temporal nearest neighbor query
  • Spatio-temporal range query
  • Spatio-temporal systems

ASJC Scopus subject areas

  • Information Systems
  • Geography, Planning and Development

Cite this

ST-Hadoop : a MapReduce framework for spatio-temporal data. / Alarabi, Louai; Mokbel, Mohamed; Musleh, Mashaal.

In: GeoInformatica, 05.07.2018, p. 1-29.

Research output: Contribution to journalArticle

Alarabi, Louai ; Mokbel, Mohamed ; Musleh, Mashaal. / ST-Hadoop : a MapReduce framework for spatio-temporal data. In: GeoInformatica. 2018 ; pp. 1-29.
@article{436177b3219948d89db2ff7724350ed9,
title = "ST-Hadoop: a MapReduce framework for spatio-temporal data",
abstract = "This paper presents ST-Hadoop; the first full-fledged open-source MapReduce framework with a native support for spatio-temporal data. ST-Hadoop is a comprehensive extension to Hadoop and SpatialHadoop that injects spatio-temporal data awareness inside each of their layers, mainly, language, indexing, and operations layers. In the language layer, ST-Hadoop provides built in spatio-temporal data types and operations. In the indexing layer, ST-Hadoop spatiotemporally loads and divides data across computation nodes in Hadoop Distributed File System in a way that mimics spatio-temporal index structures, which result in achieving orders of magnitude better performance than Hadoop and SpatialHadoop when dealing with spatio-temporal data and queries. In the operations layer, ST-Hadoop shipped with support for three fundamental spatio-temporal queries, namely, spatio-temporal range, top-k nearest neighbor, and join queries. Extensibility of ST-Hadoop allows others to extend features and operations easily using similar approaches described in the paper. Extensive experiments conducted on large-scale dataset of size 10 TB that contains over 1 Billion spatio-temporal records, to show that ST-Hadoop achieves orders of magnitude better performance than Hadoop and SpaitalHadoop when dealing with spatio-temporal data and operations. The key idea behind the performance gained in ST-Hadoop is its ability in indexing spatio-temporal data within Hadoop Distributed File System.",
keywords = "MapReduce-based systems, Spatio-temporal join query, Spatio-temporal nearest neighbor query, Spatio-temporal range query, Spatio-temporal systems",
author = "Louai Alarabi and Mohamed Mokbel and Mashaal Musleh",
year = "2018",
month = "7",
day = "5",
doi = "10.1007/s10707-018-0325-6",
language = "English",
pages = "1--29",
journal = "GeoInformatica",
issn = "1384-6175",
publisher = "Kluwer Academic Publishers",

}

TY - JOUR

T1 - ST-Hadoop

T2 - a MapReduce framework for spatio-temporal data

AU - Alarabi, Louai

AU - Mokbel, Mohamed

AU - Musleh, Mashaal

PY - 2018/7/5

Y1 - 2018/7/5

N2 - This paper presents ST-Hadoop; the first full-fledged open-source MapReduce framework with a native support for spatio-temporal data. ST-Hadoop is a comprehensive extension to Hadoop and SpatialHadoop that injects spatio-temporal data awareness inside each of their layers, mainly, language, indexing, and operations layers. In the language layer, ST-Hadoop provides built in spatio-temporal data types and operations. In the indexing layer, ST-Hadoop spatiotemporally loads and divides data across computation nodes in Hadoop Distributed File System in a way that mimics spatio-temporal index structures, which result in achieving orders of magnitude better performance than Hadoop and SpatialHadoop when dealing with spatio-temporal data and queries. In the operations layer, ST-Hadoop shipped with support for three fundamental spatio-temporal queries, namely, spatio-temporal range, top-k nearest neighbor, and join queries. Extensibility of ST-Hadoop allows others to extend features and operations easily using similar approaches described in the paper. Extensive experiments conducted on large-scale dataset of size 10 TB that contains over 1 Billion spatio-temporal records, to show that ST-Hadoop achieves orders of magnitude better performance than Hadoop and SpaitalHadoop when dealing with spatio-temporal data and operations. The key idea behind the performance gained in ST-Hadoop is its ability in indexing spatio-temporal data within Hadoop Distributed File System.

AB - This paper presents ST-Hadoop; the first full-fledged open-source MapReduce framework with a native support for spatio-temporal data. ST-Hadoop is a comprehensive extension to Hadoop and SpatialHadoop that injects spatio-temporal data awareness inside each of their layers, mainly, language, indexing, and operations layers. In the language layer, ST-Hadoop provides built in spatio-temporal data types and operations. In the indexing layer, ST-Hadoop spatiotemporally loads and divides data across computation nodes in Hadoop Distributed File System in a way that mimics spatio-temporal index structures, which result in achieving orders of magnitude better performance than Hadoop and SpatialHadoop when dealing with spatio-temporal data and queries. In the operations layer, ST-Hadoop shipped with support for three fundamental spatio-temporal queries, namely, spatio-temporal range, top-k nearest neighbor, and join queries. Extensibility of ST-Hadoop allows others to extend features and operations easily using similar approaches described in the paper. Extensive experiments conducted on large-scale dataset of size 10 TB that contains over 1 Billion spatio-temporal records, to show that ST-Hadoop achieves orders of magnitude better performance than Hadoop and SpaitalHadoop when dealing with spatio-temporal data and operations. The key idea behind the performance gained in ST-Hadoop is its ability in indexing spatio-temporal data within Hadoop Distributed File System.

KW - MapReduce-based systems

KW - Spatio-temporal join query

KW - Spatio-temporal nearest neighbor query

KW - Spatio-temporal range query

KW - Spatio-temporal systems

UR - http://www.scopus.com/inward/record.url?scp=85049576414&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049576414&partnerID=8YFLogxK

U2 - 10.1007/s10707-018-0325-6

DO - 10.1007/s10707-018-0325-6

M3 - Article

AN - SCOPUS:85049576414

SP - 1

EP - 29

JO - GeoInformatica

JF - GeoInformatica

SN - 1384-6175

ER -