Range CUBE: Efficient cube computation by exploiting data correlation

Ying Feng, Divyakant Agrawal, Amr El Abbadi, Ahmed Metwally

Research output: Contribution to conferencePaper

24 Citations (Scopus)

Abstract

Data cube computation and representation are prohibitively expensive in terms of time and space. Prior work has focused on either reducing the computation time or condensing the representation of a data cube. In this paper, we introduce Range Cubing as an efficient way to compute and compress the data cube without any loss of precision. A new data structure, range trie, is used to compress and identify correlation in attribute values, and compress the input dataset to effectively reduce the computational cost. The range cubing algorithm generates a compressed cube, called range cube, which partitions all cells into disjoint ranges. Each range represents a subset of cells with the same aggregation value, as a tuple which has the same number of dimensions as the input data tuples. The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-Cubing, experiments on real dataset show a running time of less than one thirtieth, still generating a range cube of less than one ninth of the space of the full cube, when both algorithms run in their preferred dimension orders. On synthetic data, range cubing demonstrates much better scalability, as well as higher adaptiveness to both data sparsity and skew.

Original languageEnglish
Pages658-669
Number of pages12
Publication statusPublished - 1 Jun 2004
EventProceedings - 20th International Conference on Data Engineering - ICDE 2004 - Boston, MA., United States
Duration: 30 Mar 20042 Apr 2004

Other

OtherProceedings - 20th International Conference on Data Engineering - ICDE 2004
CountryUnited States
CityBoston, MA.
Period30/3/042/4/04

    Fingerprint

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Cite this

Feng, Y., Agrawal, D., El Abbadi, A., & Metwally, A. (2004). Range CUBE: Efficient cube computation by exploiting data correlation. 658-669. Paper presented at Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States.