### Abstract

Data cube computation and representation are prohibitively expensive in terms of time and space. Prior work has focused on either reducing the computation time or condensing the representation of a data cube. In this paper, we introduce Range Cubing as an efficient way to compute and compress the data cube without any loss of precision. A new data structure, range trie, is used to compress and identify correlation in attribute values, and compress the input dataset to effectively reduce the computational cost. The range cubing algorithm generates a compressed cube, called range cube, which partitions all cells into disjoint ranges. Each range represents a subset of cells with the same aggregation value, as a tuple which has the same number of dimensions as the input data tuples. The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-Cubing, experiments on real dataset show a running time of less than one thirtieth, still generating a range cube of less than one ninth of the space of the full cube, when both algorithms run in their preferred dimension orders. On synthetic data, range cubing demonstrates much better scalability, as well as higher adaptiveness to both data sparsity and skew.

Original language | English |
---|---|

Title of host publication | Proceedings - International Conference on Data Engineering |

Pages | 658-669 |

Number of pages | 12 |

Volume | 20 |

Publication status | Published - 1 Jun 2004 |

Externally published | Yes |

Event | Proceedings - 20th International Conference on Data Engineering - ICDE 2004 - Boston, MA., United States Duration: 30 Mar 2004 → 2 Apr 2004 |

### Other

Other | Proceedings - 20th International Conference on Data Engineering - ICDE 2004 |
---|---|

Country | United States |

City | Boston, MA. |

Period | 30/3/04 → 2/4/04 |

### Fingerprint

### ASJC Scopus subject areas

- Software
- Engineering(all)
- Engineering (miscellaneous)

### Cite this

*Proceedings - International Conference on Data Engineering*(Vol. 20, pp. 658-669)

**Range CUBE : Efficient cube computation by exploiting data correlation.** / Feng, Ying; Agrawal, Divyakant; El Abbadi, Amr; Metwally, Ahmed.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Proceedings - International Conference on Data Engineering.*vol. 20, pp. 658-669, Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States, 30/3/04.

}

TY - GEN

T1 - Range CUBE

T2 - Efficient cube computation by exploiting data correlation

AU - Feng, Ying

AU - Agrawal, Divyakant

AU - El Abbadi, Amr

AU - Metwally, Ahmed

PY - 2004/6/1

Y1 - 2004/6/1

N2 - Data cube computation and representation are prohibitively expensive in terms of time and space. Prior work has focused on either reducing the computation time or condensing the representation of a data cube. In this paper, we introduce Range Cubing as an efficient way to compute and compress the data cube without any loss of precision. A new data structure, range trie, is used to compress and identify correlation in attribute values, and compress the input dataset to effectively reduce the computational cost. The range cubing algorithm generates a compressed cube, called range cube, which partitions all cells into disjoint ranges. Each range represents a subset of cells with the same aggregation value, as a tuple which has the same number of dimensions as the input data tuples. The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-Cubing, experiments on real dataset show a running time of less than one thirtieth, still generating a range cube of less than one ninth of the space of the full cube, when both algorithms run in their preferred dimension orders. On synthetic data, range cubing demonstrates much better scalability, as well as higher adaptiveness to both data sparsity and skew.

AB - Data cube computation and representation are prohibitively expensive in terms of time and space. Prior work has focused on either reducing the computation time or condensing the representation of a data cube. In this paper, we introduce Range Cubing as an efficient way to compute and compress the data cube without any loss of precision. A new data structure, range trie, is used to compress and identify correlation in attribute values, and compress the input dataset to effectively reduce the computational cost. The range cubing algorithm generates a compressed cube, called range cube, which partitions all cells into disjoint ranges. Each range represents a subset of cells with the same aggregation value, as a tuple which has the same number of dimensions as the input data tuples. The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-Cubing, experiments on real dataset show a running time of less than one thirtieth, still generating a range cube of less than one ninth of the space of the full cube, when both algorithms run in their preferred dimension orders. On synthetic data, range cubing demonstrates much better scalability, as well as higher adaptiveness to both data sparsity and skew.

UR - http://www.scopus.com/inward/record.url?scp=2442586458&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2442586458&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:2442586458

VL - 20

SP - 658

EP - 669

BT - Proceedings - International Conference on Data Engineering

ER -