HaCube

Extending MapReduce for efficient OLAP cube materialization and view maintenance

Zhengkui Wang, Yan Chu, Kian Lee Tan, Divyakant Agrawal, Amr Ei Abbadi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Data cubes are widely used as a powerful tool to provide multi-dimensional views in data warehousing and On-Line Analytical Processing (OLAP). However, with increasing data sizes, it is becoming computationally expensive to perform data cube analysis. In this paper, we introduce HaCube, an extension of MapReduce, designed for efficient parallel data cube computation on large-scale data. We also provide a general data cube materialization solution which is able to facilitate the features in MapReduce-like systems towards an efficient data cube computation. Furthermore, we demonstrate how HaCube supports view maintenance through either incremental computation (e.g. used for SUM or COUNT) or recomputation (e.g. used for MEDIAN or CORRELATION). We implement HaCube by extending Hadoop and evaluate it based on the TPC-D benchmark over billions of tuples on a cluster with over 320 cores. The experimental results demonstrate the efficiency, scalability and practicality of HaCube for cube computation over a large amount of data in a distributed environment.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages113-129
Number of pages17
Volume9643
ISBN (Print)9783319320489
DOIs
Publication statusPublished - 2016
Externally publishedYes
Event21st International Conference on Database Systems for Advanced Applications, DASFAA 2016 - Dallas, United States
Duration: 16 Apr 201619 Apr 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9643
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other21st International Conference on Database Systems for Advanced Applications, DASFAA 2016
CountryUnited States
CityDallas
Period16/4/1619/4/16

Fingerprint

Data Cube
MapReduce
Regular hexahedron
Maintenance
Processing
Data warehouses
Data Warehousing
Scalability
Distributed Environment
Demonstrate
Benchmark
Evaluate
Experimental Results

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Wang, Z., Chu, Y., Tan, K. L., Agrawal, D., & Abbadi, A. E. (2016). HaCube: Extending MapReduce for efficient OLAP cube materialization and view maintenance. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9643, pp. 113-129). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9643). Springer Verlag. https://doi.org/10.1007/978-3-319-32049-6_8

HaCube : Extending MapReduce for efficient OLAP cube materialization and view maintenance. / Wang, Zhengkui; Chu, Yan; Tan, Kian Lee; Agrawal, Divyakant; Abbadi, Amr Ei.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9643 Springer Verlag, 2016. p. 113-129 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9643).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, Z, Chu, Y, Tan, KL, Agrawal, D & Abbadi, AE 2016, HaCube: Extending MapReduce for efficient OLAP cube materialization and view maintenance. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 9643, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9643, Springer Verlag, pp. 113-129, 21st International Conference on Database Systems for Advanced Applications, DASFAA 2016, Dallas, United States, 16/4/16. https://doi.org/10.1007/978-3-319-32049-6_8
Wang Z, Chu Y, Tan KL, Agrawal D, Abbadi AE. HaCube: Extending MapReduce for efficient OLAP cube materialization and view maintenance. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9643. Springer Verlag. 2016. p. 113-129. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-32049-6_8
Wang, Zhengkui ; Chu, Yan ; Tan, Kian Lee ; Agrawal, Divyakant ; Abbadi, Amr Ei. / HaCube : Extending MapReduce for efficient OLAP cube materialization and view maintenance. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9643 Springer Verlag, 2016. pp. 113-129 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{55fef6cc03a14140bf7f807f11386400,
title = "HaCube: Extending MapReduce for efficient OLAP cube materialization and view maintenance",
abstract = "Data cubes are widely used as a powerful tool to provide multi-dimensional views in data warehousing and On-Line Analytical Processing (OLAP). However, with increasing data sizes, it is becoming computationally expensive to perform data cube analysis. In this paper, we introduce HaCube, an extension of MapReduce, designed for efficient parallel data cube computation on large-scale data. We also provide a general data cube materialization solution which is able to facilitate the features in MapReduce-like systems towards an efficient data cube computation. Furthermore, we demonstrate how HaCube supports view maintenance through either incremental computation (e.g. used for SUM or COUNT) or recomputation (e.g. used for MEDIAN or CORRELATION). We implement HaCube by extending Hadoop and evaluate it based on the TPC-D benchmark over billions of tuples on a cluster with over 320 cores. The experimental results demonstrate the efficiency, scalability and practicality of HaCube for cube computation over a large amount of data in a distributed environment.",
author = "Zhengkui Wang and Yan Chu and Tan, {Kian Lee} and Divyakant Agrawal and Abbadi, {Amr Ei}",
year = "2016",
doi = "10.1007/978-3-319-32049-6_8",
language = "English",
isbn = "9783319320489",
volume = "9643",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "113--129",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - HaCube

T2 - Extending MapReduce for efficient OLAP cube materialization and view maintenance

AU - Wang, Zhengkui

AU - Chu, Yan

AU - Tan, Kian Lee

AU - Agrawal, Divyakant

AU - Abbadi, Amr Ei

PY - 2016

Y1 - 2016

N2 - Data cubes are widely used as a powerful tool to provide multi-dimensional views in data warehousing and On-Line Analytical Processing (OLAP). However, with increasing data sizes, it is becoming computationally expensive to perform data cube analysis. In this paper, we introduce HaCube, an extension of MapReduce, designed for efficient parallel data cube computation on large-scale data. We also provide a general data cube materialization solution which is able to facilitate the features in MapReduce-like systems towards an efficient data cube computation. Furthermore, we demonstrate how HaCube supports view maintenance through either incremental computation (e.g. used for SUM or COUNT) or recomputation (e.g. used for MEDIAN or CORRELATION). We implement HaCube by extending Hadoop and evaluate it based on the TPC-D benchmark over billions of tuples on a cluster with over 320 cores. The experimental results demonstrate the efficiency, scalability and practicality of HaCube for cube computation over a large amount of data in a distributed environment.

AB - Data cubes are widely used as a powerful tool to provide multi-dimensional views in data warehousing and On-Line Analytical Processing (OLAP). However, with increasing data sizes, it is becoming computationally expensive to perform data cube analysis. In this paper, we introduce HaCube, an extension of MapReduce, designed for efficient parallel data cube computation on large-scale data. We also provide a general data cube materialization solution which is able to facilitate the features in MapReduce-like systems towards an efficient data cube computation. Furthermore, we demonstrate how HaCube supports view maintenance through either incremental computation (e.g. used for SUM or COUNT) or recomputation (e.g. used for MEDIAN or CORRELATION). We implement HaCube by extending Hadoop and evaluate it based on the TPC-D benchmark over billions of tuples on a cluster with over 320 cores. The experimental results demonstrate the efficiency, scalability and practicality of HaCube for cube computation over a large amount of data in a distributed environment.

UR - http://www.scopus.com/inward/record.url?scp=84962473694&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962473694&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-32049-6_8

DO - 10.1007/978-3-319-32049-6_8

M3 - Conference contribution

SN - 9783319320489

VL - 9643

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 113

EP - 129

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -