Applying the golden rule of sampling for query estimation

Yi Leh Wu, Divyakant Agrawal, Amr El Abbadi

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Query size estimation is crucial for many database system components. In particular, query optimizers need efficient and accurate query size estimation when deciding among alternative query plans. In this paper we propose a novel sampling technique based on the golden rule of sampling, introduced by von Neumann in 1947, for estimating range queries. The proposed technique randomly samples the frequency domain using the cumulative frequency distribution and yields good estimates without any a priori knowledge of the actual underlying distribution of spatial objects. We show experimentally that the proposed sampling technique gives smaller approximation error than the Min-Skew histogram based and wavelet based approaches for both synthetic and real datasets. Moreover, the proposed technique can be easily extended for higher dimensional datasets.

Original languageEnglish
Pages (from-to)449-460
Number of pages12
JournalSIGMOD Record (ACM Special Interest Group on Management of Data)
Volume30
Issue number2
DOIs
Publication statusPublished - Jun 2001

    Fingerprint

Keywords

  • Cumulative frequency distribution
  • Query estimation
  • Random sampling
  • Range query

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this