Applying the golden rule of sampling for query estimation

Yi Leh Wu, Divyakant Agrawal, Amr El Abbadi

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Query size estimation is crucial for many database system components. In particular, query optimizers need efficient and accurate query size estimation when deciding among alternative query plans. In this paper we propose a novel sampling technique based on the golden rule of sampling, introduced by von Neumann in 1947, for estimating range queries. The proposed technique randomly samples the frequency domain using the cumulative frequency distribution and yields good estimates without any a priori knowledge of the actual underlying distribution of spatial objects. We show experimentally that the proposed sampling technique gives smaller approximation error than the Min-Skew histogram based and wavelet based approaches for both synthetic and real datasets. Moreover, the proposed technique can be easily extended for higher dimensional datasets.

Original languageEnglish
Pages (from-to)449-460
Number of pages12
JournalSIGMOD Record (ACM Special Interest Group on Management of Data)
Volume30
Issue number2
Publication statusPublished - 1 Jun 2001
Externally publishedYes

Fingerprint

Sampling

Keywords

  • Cumulative frequency distribution
  • Query estimation
  • Random sampling
  • Range query

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Information Systems
  • Software

Cite this

Applying the golden rule of sampling for query estimation. / Wu, Yi Leh; Agrawal, Divyakant; El Abbadi, Amr.

In: SIGMOD Record (ACM Special Interest Group on Management of Data), Vol. 30, No. 2, 01.06.2001, p. 449-460.

Research output: Contribution to journalArticle

Wu, Yi Leh ; Agrawal, Divyakant ; El Abbadi, Amr. / Applying the golden rule of sampling for query estimation. In: SIGMOD Record (ACM Special Interest Group on Management of Data). 2001 ; Vol. 30, No. 2. pp. 449-460.
@article{eb33dc8a24614e3eae676fd69a15591d,
title = "Applying the golden rule of sampling for query estimation",
abstract = "Query size estimation is crucial for many database system components. In particular, query optimizers need efficient and accurate query size estimation when deciding among alternative query plans. In this paper we propose a novel sampling technique based on the golden rule of sampling, introduced by von Neumann in 1947, for estimating range queries. The proposed technique randomly samples the frequency domain using the cumulative frequency distribution and yields good estimates without any a priori knowledge of the actual underlying distribution of spatial objects. We show experimentally that the proposed sampling technique gives smaller approximation error than the Min-Skew histogram based and wavelet based approaches for both synthetic and real datasets. Moreover, the proposed technique can be easily extended for higher dimensional datasets.",
keywords = "Cumulative frequency distribution, Query estimation, Random sampling, Range query",
author = "Wu, {Yi Leh} and Divyakant Agrawal and {El Abbadi}, Amr",
year = "2001",
month = "6",
day = "1",
language = "English",
volume = "30",
pages = "449--460",
journal = "SIGMOD Record",
issn = "0163-5808",
publisher = "Association for Computing Machinery (ACM)",
number = "2",

}

TY - JOUR

T1 - Applying the golden rule of sampling for query estimation

AU - Wu, Yi Leh

AU - Agrawal, Divyakant

AU - El Abbadi, Amr

PY - 2001/6/1

Y1 - 2001/6/1

N2 - Query size estimation is crucial for many database system components. In particular, query optimizers need efficient and accurate query size estimation when deciding among alternative query plans. In this paper we propose a novel sampling technique based on the golden rule of sampling, introduced by von Neumann in 1947, for estimating range queries. The proposed technique randomly samples the frequency domain using the cumulative frequency distribution and yields good estimates without any a priori knowledge of the actual underlying distribution of spatial objects. We show experimentally that the proposed sampling technique gives smaller approximation error than the Min-Skew histogram based and wavelet based approaches for both synthetic and real datasets. Moreover, the proposed technique can be easily extended for higher dimensional datasets.

AB - Query size estimation is crucial for many database system components. In particular, query optimizers need efficient and accurate query size estimation when deciding among alternative query plans. In this paper we propose a novel sampling technique based on the golden rule of sampling, introduced by von Neumann in 1947, for estimating range queries. The proposed technique randomly samples the frequency domain using the cumulative frequency distribution and yields good estimates without any a priori knowledge of the actual underlying distribution of spatial objects. We show experimentally that the proposed sampling technique gives smaller approximation error than the Min-Skew histogram based and wavelet based approaches for both synthetic and real datasets. Moreover, the proposed technique can be easily extended for higher dimensional datasets.

KW - Cumulative frequency distribution

KW - Query estimation

KW - Random sampling

KW - Range query

UR - http://www.scopus.com/inward/record.url?scp=0038968612&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0038968612&partnerID=8YFLogxK

M3 - Article

VL - 30

SP - 449

EP - 460

JO - SIGMOD Record

JF - SIGMOD Record

SN - 0163-5808

IS - 2

ER -