MD -HBase

Design and implementation of an elastic data infrastructure for cloud-scale location services

Shoji Nishimura, Sudipto Das, Divyakant Agrawal, Amr El Abbadi

Research output: Contribution to journalArticle

56 Citations (Scopus)

Abstract

The ubiquity of location enabled devices has resulted in a wide proliferation of location based applications and services. To handle the growing scale, database management systems driving such location based services (LBS) must cope with high insert rates for location updates of millions of devices, while supporting efficient real-time analysis on latest location. Traditional DBMSs, equipped with multi-dimensional index structures, can efficiently handle spatio-temporal data. However, popular open-source relational database systems are overwhelmed by the high insertion rates, real-time querying requirements, and terabytes of data that these systems must handle. On the other hand, key-value stores can effectively support large scale operation, but do not natively provide multi-attribute accesses needed to support the rich querying functionality essential for the LBSs. We present the design and implementation of MD -HBase, a scalable data management infrastructure for LBSs that bridges this gap between scale and functionality. Our approach leverages a multi-dimensional index structure layered over a key-value store. The underlying key-value store allows the system to sustain high insert throughput and large data volumes, while ensuring fault-tolerance, and high availability. On the other hand, the index layer allows efficient multi-dimensional query processing. Our optimized query processing technique accesses only the index and storage level entries that intersect with the query region, thus ensuring efficient query processing. We present the design of MD -HBase that demonstrates how two standard index structures - the K-d tree and the Quad tree - can be layered over a range partitioned key-value store to provide scalable multi-dimensional data infrastructure. Our prototype implementation using HBase, a standard open-source key-value store, can handle hundreds of thousands of inserts per second using a modest 16 node cluster, while efficiently processing multi-dimensional range queries and nearest neighbor queries in real-time with response times as low as few hundreds of milliseconds.

Original languageEnglish
Pages (from-to)289-319
Number of pages31
JournalDistributed and Parallel Databases
Volume31
Issue number2
DOIs
Publication statusPublished - 1 Jun 2013
Externally publishedYes

Fingerprint

Query processing
Relational database systems
Location based services
Fault tolerance
Information management
Throughput
Availability
Processing
Location-based services
Query
Functionality
Open source
Database management systems

Keywords

  • Key-value stores
  • Location based services
  • Multi-dimensional data
  • Real time analysis

ASJC Scopus subject areas

  • Information Systems
  • Software
  • Hardware and Architecture
  • Information Systems and Management

Cite this

MD -HBase : Design and implementation of an elastic data infrastructure for cloud-scale location services. / Nishimura, Shoji; Das, Sudipto; Agrawal, Divyakant; El Abbadi, Amr.

In: Distributed and Parallel Databases, Vol. 31, No. 2, 01.06.2013, p. 289-319.

Research output: Contribution to journalArticle

Nishimura, Shoji ; Das, Sudipto ; Agrawal, Divyakant ; El Abbadi, Amr. / MD -HBase : Design and implementation of an elastic data infrastructure for cloud-scale location services. In: Distributed and Parallel Databases. 2013 ; Vol. 31, No. 2. pp. 289-319.
@article{9ad1dc0d4a0d42deaadd037bf8cf5bc3,
title = "MD -HBase: Design and implementation of an elastic data infrastructure for cloud-scale location services",
abstract = "The ubiquity of location enabled devices has resulted in a wide proliferation of location based applications and services. To handle the growing scale, database management systems driving such location based services (LBS) must cope with high insert rates for location updates of millions of devices, while supporting efficient real-time analysis on latest location. Traditional DBMSs, equipped with multi-dimensional index structures, can efficiently handle spatio-temporal data. However, popular open-source relational database systems are overwhelmed by the high insertion rates, real-time querying requirements, and terabytes of data that these systems must handle. On the other hand, key-value stores can effectively support large scale operation, but do not natively provide multi-attribute accesses needed to support the rich querying functionality essential for the LBSs. We present the design and implementation of MD -HBase, a scalable data management infrastructure for LBSs that bridges this gap between scale and functionality. Our approach leverages a multi-dimensional index structure layered over a key-value store. The underlying key-value store allows the system to sustain high insert throughput and large data volumes, while ensuring fault-tolerance, and high availability. On the other hand, the index layer allows efficient multi-dimensional query processing. Our optimized query processing technique accesses only the index and storage level entries that intersect with the query region, thus ensuring efficient query processing. We present the design of MD -HBase that demonstrates how two standard index structures - the K-d tree and the Quad tree - can be layered over a range partitioned key-value store to provide scalable multi-dimensional data infrastructure. Our prototype implementation using HBase, a standard open-source key-value store, can handle hundreds of thousands of inserts per second using a modest 16 node cluster, while efficiently processing multi-dimensional range queries and nearest neighbor queries in real-time with response times as low as few hundreds of milliseconds.",
keywords = "Key-value stores, Location based services, Multi-dimensional data, Real time analysis",
author = "Shoji Nishimura and Sudipto Das and Divyakant Agrawal and {El Abbadi}, Amr",
year = "2013",
month = "6",
day = "1",
doi = "10.1007/s10619-012-7109-z",
language = "English",
volume = "31",
pages = "289--319",
journal = "Distributed and Parallel Databases",
issn = "0926-8782",
publisher = "Springer Netherlands",
number = "2",

}

TY - JOUR

T1 - MD -HBase

T2 - Design and implementation of an elastic data infrastructure for cloud-scale location services

AU - Nishimura, Shoji

AU - Das, Sudipto

AU - Agrawal, Divyakant

AU - El Abbadi, Amr

PY - 2013/6/1

Y1 - 2013/6/1

N2 - The ubiquity of location enabled devices has resulted in a wide proliferation of location based applications and services. To handle the growing scale, database management systems driving such location based services (LBS) must cope with high insert rates for location updates of millions of devices, while supporting efficient real-time analysis on latest location. Traditional DBMSs, equipped with multi-dimensional index structures, can efficiently handle spatio-temporal data. However, popular open-source relational database systems are overwhelmed by the high insertion rates, real-time querying requirements, and terabytes of data that these systems must handle. On the other hand, key-value stores can effectively support large scale operation, but do not natively provide multi-attribute accesses needed to support the rich querying functionality essential for the LBSs. We present the design and implementation of MD -HBase, a scalable data management infrastructure for LBSs that bridges this gap between scale and functionality. Our approach leverages a multi-dimensional index structure layered over a key-value store. The underlying key-value store allows the system to sustain high insert throughput and large data volumes, while ensuring fault-tolerance, and high availability. On the other hand, the index layer allows efficient multi-dimensional query processing. Our optimized query processing technique accesses only the index and storage level entries that intersect with the query region, thus ensuring efficient query processing. We present the design of MD -HBase that demonstrates how two standard index structures - the K-d tree and the Quad tree - can be layered over a range partitioned key-value store to provide scalable multi-dimensional data infrastructure. Our prototype implementation using HBase, a standard open-source key-value store, can handle hundreds of thousands of inserts per second using a modest 16 node cluster, while efficiently processing multi-dimensional range queries and nearest neighbor queries in real-time with response times as low as few hundreds of milliseconds.

AB - The ubiquity of location enabled devices has resulted in a wide proliferation of location based applications and services. To handle the growing scale, database management systems driving such location based services (LBS) must cope with high insert rates for location updates of millions of devices, while supporting efficient real-time analysis on latest location. Traditional DBMSs, equipped with multi-dimensional index structures, can efficiently handle spatio-temporal data. However, popular open-source relational database systems are overwhelmed by the high insertion rates, real-time querying requirements, and terabytes of data that these systems must handle. On the other hand, key-value stores can effectively support large scale operation, but do not natively provide multi-attribute accesses needed to support the rich querying functionality essential for the LBSs. We present the design and implementation of MD -HBase, a scalable data management infrastructure for LBSs that bridges this gap between scale and functionality. Our approach leverages a multi-dimensional index structure layered over a key-value store. The underlying key-value store allows the system to sustain high insert throughput and large data volumes, while ensuring fault-tolerance, and high availability. On the other hand, the index layer allows efficient multi-dimensional query processing. Our optimized query processing technique accesses only the index and storage level entries that intersect with the query region, thus ensuring efficient query processing. We present the design of MD -HBase that demonstrates how two standard index structures - the K-d tree and the Quad tree - can be layered over a range partitioned key-value store to provide scalable multi-dimensional data infrastructure. Our prototype implementation using HBase, a standard open-source key-value store, can handle hundreds of thousands of inserts per second using a modest 16 node cluster, while efficiently processing multi-dimensional range queries and nearest neighbor queries in real-time with response times as low as few hundreds of milliseconds.

KW - Key-value stores

KW - Location based services

KW - Multi-dimensional data

KW - Real time analysis

UR - http://www.scopus.com/inward/record.url?scp=84877839383&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84877839383&partnerID=8YFLogxK

U2 - 10.1007/s10619-012-7109-z

DO - 10.1007/s10619-012-7109-z

M3 - Article

VL - 31

SP - 289

EP - 319

JO - Distributed and Parallel Databases

JF - Distributed and Parallel Databases

SN - 0926-8782

IS - 2

ER -