Design and implementation of a real-time interactive analytics system for large spatio-temporal data

Shiming Zhang, Yin Yang, Wei Fan, Marianne Winslett

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

In real-time interactive data analytics, the user expects to receive the results of each query within a short time period such as seconds. This is especially challenging when the data is big (e.g., on the scale of petabytes), and the analytics system runs on top of cloud infrastructure (e.g., thousands of interconnected commodity servers). We have been building such a system, called OceanRT, for managing large spatio-temporal data such as call logs and mobile web browsing records collected by a telecommunication company. Although there already exist systems for querying big data in real time, OceanRT's performance stands out due to several novel designs and components that address key efficiency and scalability issues that were largely overlooked in existing systems. First, OceanRT makes extensive use of software RDMA one-sided operations, which reduce networking costs without requiring specialized hardware. Second, OceanRT exploits the parallel computing capabilities of each node in the cloud through a novel architecture consisting of Access-Query Engines (AQEs) connected with minimal overhead. Third, OceanRT contains a novel storage scheme that optimizes for queries with joins and multi-dimensional selections, which are common for large spatiotemporal data. Experiments using the TPC-DS benchmark show that OceanRT is usually more than an order of magnitude faster than the current state-of-the-art systems.

Original languageEnglish
Pages (from-to)1754-1759
Number of pages6
JournalProceedings of the VLDB Endowment
Volume7
Issue number13
Publication statusPublished - 2014
Externally publishedYes

Fingerprint

Parallel processing systems
Telecommunication
Scalability
Servers
Engines
Hardware
Costs
Industry
Experiments
Big data

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Design and implementation of a real-time interactive analytics system for large spatio-temporal data. / Zhang, Shiming; Yang, Yin; Fan, Wei; Winslett, Marianne.

In: Proceedings of the VLDB Endowment, Vol. 7, No. 13, 2014, p. 1754-1759.

Research output: Contribution to journalArticle

@article{eea2a24261244613a87d474bfdf6b938,
title = "Design and implementation of a real-time interactive analytics system for large spatio-temporal data",
abstract = "In real-time interactive data analytics, the user expects to receive the results of each query within a short time period such as seconds. This is especially challenging when the data is big (e.g., on the scale of petabytes), and the analytics system runs on top of cloud infrastructure (e.g., thousands of interconnected commodity servers). We have been building such a system, called OceanRT, for managing large spatio-temporal data such as call logs and mobile web browsing records collected by a telecommunication company. Although there already exist systems for querying big data in real time, OceanRT's performance stands out due to several novel designs and components that address key efficiency and scalability issues that were largely overlooked in existing systems. First, OceanRT makes extensive use of software RDMA one-sided operations, which reduce networking costs without requiring specialized hardware. Second, OceanRT exploits the parallel computing capabilities of each node in the cloud through a novel architecture consisting of Access-Query Engines (AQEs) connected with minimal overhead. Third, OceanRT contains a novel storage scheme that optimizes for queries with joins and multi-dimensional selections, which are common for large spatiotemporal data. Experiments using the TPC-DS benchmark show that OceanRT is usually more than an order of magnitude faster than the current state-of-the-art systems.",
author = "Shiming Zhang and Yin Yang and Wei Fan and Marianne Winslett",
year = "2014",
language = "English",
volume = "7",
pages = "1754--1759",
journal = "Proceedings of the VLDB Endowment",
issn = "2150-8097",
publisher = "Very Large Data Base Endowment Inc.",
number = "13",

}

TY - JOUR

T1 - Design and implementation of a real-time interactive analytics system for large spatio-temporal data

AU - Zhang, Shiming

AU - Yang, Yin

AU - Fan, Wei

AU - Winslett, Marianne

PY - 2014

Y1 - 2014

N2 - In real-time interactive data analytics, the user expects to receive the results of each query within a short time period such as seconds. This is especially challenging when the data is big (e.g., on the scale of petabytes), and the analytics system runs on top of cloud infrastructure (e.g., thousands of interconnected commodity servers). We have been building such a system, called OceanRT, for managing large spatio-temporal data such as call logs and mobile web browsing records collected by a telecommunication company. Although there already exist systems for querying big data in real time, OceanRT's performance stands out due to several novel designs and components that address key efficiency and scalability issues that were largely overlooked in existing systems. First, OceanRT makes extensive use of software RDMA one-sided operations, which reduce networking costs without requiring specialized hardware. Second, OceanRT exploits the parallel computing capabilities of each node in the cloud through a novel architecture consisting of Access-Query Engines (AQEs) connected with minimal overhead. Third, OceanRT contains a novel storage scheme that optimizes for queries with joins and multi-dimensional selections, which are common for large spatiotemporal data. Experiments using the TPC-DS benchmark show that OceanRT is usually more than an order of magnitude faster than the current state-of-the-art systems.

AB - In real-time interactive data analytics, the user expects to receive the results of each query within a short time period such as seconds. This is especially challenging when the data is big (e.g., on the scale of petabytes), and the analytics system runs on top of cloud infrastructure (e.g., thousands of interconnected commodity servers). We have been building such a system, called OceanRT, for managing large spatio-temporal data such as call logs and mobile web browsing records collected by a telecommunication company. Although there already exist systems for querying big data in real time, OceanRT's performance stands out due to several novel designs and components that address key efficiency and scalability issues that were largely overlooked in existing systems. First, OceanRT makes extensive use of software RDMA one-sided operations, which reduce networking costs without requiring specialized hardware. Second, OceanRT exploits the parallel computing capabilities of each node in the cloud through a novel architecture consisting of Access-Query Engines (AQEs) connected with minimal overhead. Third, OceanRT contains a novel storage scheme that optimizes for queries with joins and multi-dimensional selections, which are common for large spatiotemporal data. Experiments using the TPC-DS benchmark show that OceanRT is usually more than an order of magnitude faster than the current state-of-the-art systems.

UR - http://www.scopus.com/inward/record.url?scp=84905815260&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84905815260&partnerID=8YFLogxK

M3 - Article

VL - 7

SP - 1754

EP - 1759

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

SN - 2150-8097

IS - 13

ER -