### Abstract

Data cube computation and representation are prohibitively expensive in terms of time and space. Prior work has focused on either reducing the computation time or condensing the representation of a data cube. In this paper, we introduce Range Cubing as an efficient way to compute and compress the data cube without any loss of precision. A new data structure, range trie, is used to compress and identify correlation in attribute values, and compress the input dataset to effectively reduce the computational cost. The range cubing algorithm generates a compressed cube, called range cube, which partitions all cells into disjoint ranges. Each range represents a subset of cells with the same aggregation value, as a tuple which has the same number of dimensions as the input data tuples. The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-Cubing, experiments on real dataset show a running time of less than one thirtieth, still generating a range cube of less than one ninth of the space of the full cube, when both algorithms run in their preferred dimension orders. On synthetic data, range cubing demonstrates much better scalability, as well as higher adaptiveness to both data sparsity and skew.

Original language | English |
---|---|

Pages | 658-669 |

Number of pages | 12 |

Publication status | Published - 1 Jun 2004 |

Event | Proceedings - 20th International Conference on Data Engineering - ICDE 2004 - Boston, MA., United States Duration: 30 Mar 2004 → 2 Apr 2004 |

### Other

Other | Proceedings - 20th International Conference on Data Engineering - ICDE 2004 |
---|---|

Country | United States |

City | Boston, MA. |

Period | 30/3/04 → 2/4/04 |

### Fingerprint

### ASJC Scopus subject areas

- Software
- Signal Processing
- Information Systems

### Cite this

*Range CUBE: Efficient cube computation by exploiting data correlation*. 658-669. Paper presented at Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States.