Range CUBE: Efficient cube computation by exploiting data correlation

Ying Feng, Divyakant Agrawal, Amr El Abbadi, Ahmed Metwally

Research output: Chapter in Book/Report/Conference proceedingConference contribution

24 Citations (Scopus)

Abstract

Data cube computation and representation are prohibitively expensive in terms of time and space. Prior work has focused on either reducing the computation time or condensing the representation of a data cube. In this paper, we introduce Range Cubing as an efficient way to compute and compress the data cube without any loss of precision. A new data structure, range trie, is used to compress and identify correlation in attribute values, and compress the input dataset to effectively reduce the computational cost. The range cubing algorithm generates a compressed cube, called range cube, which partitions all cells into disjoint ranges. Each range represents a subset of cells with the same aggregation value, as a tuple which has the same number of dimensions as the input data tuples. The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-Cubing, experiments on real dataset show a running time of less than one thirtieth, still generating a range cube of less than one ninth of the space of the full cube, when both algorithms run in their preferred dimension orders. On synthetic data, range cubing demonstrates much better scalability, as well as higher adaptiveness to both data sparsity and skew.

Original languageEnglish
Title of host publicationProceedings - International Conference on Data Engineering
Pages658-669
Number of pages12
Volume20
Publication statusPublished - 1 Jun 2004
Externally publishedYes
EventProceedings - 20th International Conference on Data Engineering - ICDE 2004 - Boston, MA., United States
Duration: 30 Mar 20042 Apr 2004

Other

OtherProceedings - 20th International Conference on Data Engineering - ICDE 2004
CountryUnited States
CityBoston, MA.
Period30/3/042/4/04

Fingerprint

Data structures
Scalability
Agglomeration
Semantics
Costs
Experiments

ASJC Scopus subject areas

  • Software
  • Engineering(all)
  • Engineering (miscellaneous)

Cite this

Feng, Y., Agrawal, D., El Abbadi, A., & Metwally, A. (2004). Range CUBE: Efficient cube computation by exploiting data correlation. In Proceedings - International Conference on Data Engineering (Vol. 20, pp. 658-669)

Range CUBE : Efficient cube computation by exploiting data correlation. / Feng, Ying; Agrawal, Divyakant; El Abbadi, Amr; Metwally, Ahmed.

Proceedings - International Conference on Data Engineering. Vol. 20 2004. p. 658-669.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Feng, Y, Agrawal, D, El Abbadi, A & Metwally, A 2004, Range CUBE: Efficient cube computation by exploiting data correlation. in Proceedings - International Conference on Data Engineering. vol. 20, pp. 658-669, Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States, 30/3/04.
Feng Y, Agrawal D, El Abbadi A, Metwally A. Range CUBE: Efficient cube computation by exploiting data correlation. In Proceedings - International Conference on Data Engineering. Vol. 20. 2004. p. 658-669
Feng, Ying ; Agrawal, Divyakant ; El Abbadi, Amr ; Metwally, Ahmed. / Range CUBE : Efficient cube computation by exploiting data correlation. Proceedings - International Conference on Data Engineering. Vol. 20 2004. pp. 658-669
@inproceedings{9c85baf086aa41cfa38c9af2da050feb,
title = "Range CUBE: Efficient cube computation by exploiting data correlation",
abstract = "Data cube computation and representation are prohibitively expensive in terms of time and space. Prior work has focused on either reducing the computation time or condensing the representation of a data cube. In this paper, we introduce Range Cubing as an efficient way to compute and compress the data cube without any loss of precision. A new data structure, range trie, is used to compress and identify correlation in attribute values, and compress the input dataset to effectively reduce the computational cost. The range cubing algorithm generates a compressed cube, called range cube, which partitions all cells into disjoint ranges. Each range represents a subset of cells with the same aggregation value, as a tuple which has the same number of dimensions as the input data tuples. The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-Cubing, experiments on real dataset show a running time of less than one thirtieth, still generating a range cube of less than one ninth of the space of the full cube, when both algorithms run in their preferred dimension orders. On synthetic data, range cubing demonstrates much better scalability, as well as higher adaptiveness to both data sparsity and skew.",
author = "Ying Feng and Divyakant Agrawal and {El Abbadi}, Amr and Ahmed Metwally",
year = "2004",
month = "6",
day = "1",
language = "English",
volume = "20",
pages = "658--669",
booktitle = "Proceedings - International Conference on Data Engineering",

}

TY - GEN

T1 - Range CUBE

T2 - Efficient cube computation by exploiting data correlation

AU - Feng, Ying

AU - Agrawal, Divyakant

AU - El Abbadi, Amr

AU - Metwally, Ahmed

PY - 2004/6/1

Y1 - 2004/6/1

N2 - Data cube computation and representation are prohibitively expensive in terms of time and space. Prior work has focused on either reducing the computation time or condensing the representation of a data cube. In this paper, we introduce Range Cubing as an efficient way to compute and compress the data cube without any loss of precision. A new data structure, range trie, is used to compress and identify correlation in attribute values, and compress the input dataset to effectively reduce the computational cost. The range cubing algorithm generates a compressed cube, called range cube, which partitions all cells into disjoint ranges. Each range represents a subset of cells with the same aggregation value, as a tuple which has the same number of dimensions as the input data tuples. The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-Cubing, experiments on real dataset show a running time of less than one thirtieth, still generating a range cube of less than one ninth of the space of the full cube, when both algorithms run in their preferred dimension orders. On synthetic data, range cubing demonstrates much better scalability, as well as higher adaptiveness to both data sparsity and skew.

AB - Data cube computation and representation are prohibitively expensive in terms of time and space. Prior work has focused on either reducing the computation time or condensing the representation of a data cube. In this paper, we introduce Range Cubing as an efficient way to compute and compress the data cube without any loss of precision. A new data structure, range trie, is used to compress and identify correlation in attribute values, and compress the input dataset to effectively reduce the computational cost. The range cubing algorithm generates a compressed cube, called range cube, which partitions all cells into disjoint ranges. Each range represents a subset of cells with the same aggregation value, as a tuple which has the same number of dimensions as the input data tuples. The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-Cubing, experiments on real dataset show a running time of less than one thirtieth, still generating a range cube of less than one ninth of the space of the full cube, when both algorithms run in their preferred dimension orders. On synthetic data, range cubing demonstrates much better scalability, as well as higher adaptiveness to both data sparsity and skew.

UR - http://www.scopus.com/inward/record.url?scp=2442586458&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2442586458&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:2442586458

VL - 20

SP - 658

EP - 669

BT - Proceedings - International Conference on Data Engineering

ER -