Scalable computational geometry in MapReduce

Yuan Li, Ahmed Eldawy, Jie Xue, Nadezda Knorozova, Mohamed Mokbel, Ravi Janardan

Research output: Contribution to journalArticle

Abstract

Hadoop, employing the MapReduce programming paradigm, has been widely accepted as the standard framework for analyzing big data in distributed environments. Unfortunately, this rich framework has not been exploited for processing large-scale computational geometry operations. This paper introduces CG_Hadoop; a suite of scalable and efficient MapReduce algorithms for various fundamental computational geometry operations, namely polygon union, Voronoi diagram, skyline, convex hull, farthest pair, and closest pair, which present a set of key components for other geometric algorithms. For each computational geometry operation, CG_Hadoop has two versions, one for the Apache Hadoop system and one for the SpatialHadoop system, a Hadoop-based system that is more suited for spatial operations. These proposed algorithms form the nucleus of a comprehensive MapReduce library of computational geometry operations. Extensive experimental results run on a cluster of 25 machines over datasets of size up to 3.8B records show that CG_Hadoop achieves up to 14x and 115x better performance than traditional algorithms when using Hadoop and SpatialHadoop systems, respectively.

Original languageEnglish
JournalVLDB Journal
DOIs
Publication statusAccepted/In press - 1 Jan 2019

Fingerprint

Computational geometry
Processing

Keywords

  • Computational Geometry
  • Distributed Systems
  • Hadoop
  • MapReduce
  • Output-sensitive Algorithms

ASJC Scopus subject areas

  • Information Systems
  • Hardware and Architecture

Cite this

Li, Y., Eldawy, A., Xue, J., Knorozova, N., Mokbel, M., & Janardan, R. (Accepted/In press). Scalable computational geometry in MapReduce. VLDB Journal. https://doi.org/10.1007/s00778-018-0534-5

Scalable computational geometry in MapReduce. / Li, Yuan; Eldawy, Ahmed; Xue, Jie; Knorozova, Nadezda; Mokbel, Mohamed; Janardan, Ravi.

In: VLDB Journal, 01.01.2019.

Research output: Contribution to journalArticle

Li, Yuan ; Eldawy, Ahmed ; Xue, Jie ; Knorozova, Nadezda ; Mokbel, Mohamed ; Janardan, Ravi. / Scalable computational geometry in MapReduce. In: VLDB Journal. 2019.
@article{fb048b6889d6404ba76ebaf79725df01,
title = "Scalable computational geometry in MapReduce",
abstract = "Hadoop, employing the MapReduce programming paradigm, has been widely accepted as the standard framework for analyzing big data in distributed environments. Unfortunately, this rich framework has not been exploited for processing large-scale computational geometry operations. This paper introduces CG_Hadoop; a suite of scalable and efficient MapReduce algorithms for various fundamental computational geometry operations, namely polygon union, Voronoi diagram, skyline, convex hull, farthest pair, and closest pair, which present a set of key components for other geometric algorithms. For each computational geometry operation, CG_Hadoop has two versions, one for the Apache Hadoop system and one for the SpatialHadoop system, a Hadoop-based system that is more suited for spatial operations. These proposed algorithms form the nucleus of a comprehensive MapReduce library of computational geometry operations. Extensive experimental results run on a cluster of 25 machines over datasets of size up to 3.8B records show that CG_Hadoop achieves up to 14x and 115x better performance than traditional algorithms when using Hadoop and SpatialHadoop systems, respectively.",
keywords = "Computational Geometry, Distributed Systems, Hadoop, MapReduce, Output-sensitive Algorithms",
author = "Yuan Li and Ahmed Eldawy and Jie Xue and Nadezda Knorozova and Mohamed Mokbel and Ravi Janardan",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/s00778-018-0534-5",
language = "English",
journal = "VLDB Journal",
issn = "1066-8888",
publisher = "Springer New York",

}

TY - JOUR

T1 - Scalable computational geometry in MapReduce

AU - Li, Yuan

AU - Eldawy, Ahmed

AU - Xue, Jie

AU - Knorozova, Nadezda

AU - Mokbel, Mohamed

AU - Janardan, Ravi

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Hadoop, employing the MapReduce programming paradigm, has been widely accepted as the standard framework for analyzing big data in distributed environments. Unfortunately, this rich framework has not been exploited for processing large-scale computational geometry operations. This paper introduces CG_Hadoop; a suite of scalable and efficient MapReduce algorithms for various fundamental computational geometry operations, namely polygon union, Voronoi diagram, skyline, convex hull, farthest pair, and closest pair, which present a set of key components for other geometric algorithms. For each computational geometry operation, CG_Hadoop has two versions, one for the Apache Hadoop system and one for the SpatialHadoop system, a Hadoop-based system that is more suited for spatial operations. These proposed algorithms form the nucleus of a comprehensive MapReduce library of computational geometry operations. Extensive experimental results run on a cluster of 25 machines over datasets of size up to 3.8B records show that CG_Hadoop achieves up to 14x and 115x better performance than traditional algorithms when using Hadoop and SpatialHadoop systems, respectively.

AB - Hadoop, employing the MapReduce programming paradigm, has been widely accepted as the standard framework for analyzing big data in distributed environments. Unfortunately, this rich framework has not been exploited for processing large-scale computational geometry operations. This paper introduces CG_Hadoop; a suite of scalable and efficient MapReduce algorithms for various fundamental computational geometry operations, namely polygon union, Voronoi diagram, skyline, convex hull, farthest pair, and closest pair, which present a set of key components for other geometric algorithms. For each computational geometry operation, CG_Hadoop has two versions, one for the Apache Hadoop system and one for the SpatialHadoop system, a Hadoop-based system that is more suited for spatial operations. These proposed algorithms form the nucleus of a comprehensive MapReduce library of computational geometry operations. Extensive experimental results run on a cluster of 25 machines over datasets of size up to 3.8B records show that CG_Hadoop achieves up to 14x and 115x better performance than traditional algorithms when using Hadoop and SpatialHadoop systems, respectively.

KW - Computational Geometry

KW - Distributed Systems

KW - Hadoop

KW - MapReduce

KW - Output-sensitive Algorithms

UR - http://www.scopus.com/inward/record.url?scp=85060150253&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85060150253&partnerID=8YFLogxK

U2 - 10.1007/s00778-018-0534-5

DO - 10.1007/s00778-018-0534-5

M3 - Article

JO - VLDB Journal

JF - VLDB Journal

SN - 1066-8888

ER -