Applying the golden rule of sampling for query estimation

Y. L. Wu, D. Agrawal, A. El Abbadi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Citations (Scopus)

Abstract

Query size estimation is crucial for many database system components. In particular, query optimizers need efficient and accurate query size estimation when deciding among alternative query plans. In this paper we propose a novel sampling technique based on the golden rule of sampling, introduced by von Neumann in 1947, for estimating range queries. The proposed technique randomly samples the frequency domain using the cumulative frequency distribution and yields good estimates without any a priori knowledge of the actual underlying distribution of spatial objects. We show experimentally that the proposed sampling technique gives smaller approximation error than the Min-Skew histogram based and wavelet based approaches for both synthetic and real datasets. Moreover, the proposed technique can be easily extended for higher dimensional datasets.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGMOD International Conference on Management of Data
EditorsT. Sellis, S. Mehrotra
Pages449-460
Number of pages12
Publication statusPublished - 2001
Externally publishedYes
Event2001 ACM SIGMOD International Conference on Management of Data - Santa Barbara, CA, United States
Duration: 21 May 200124 May 2001

Other

Other2001 ACM SIGMOD International Conference on Management of Data
CountryUnited States
CitySanta Barbara, CA
Period21/5/0124/5/01

Fingerprint

Sampling

Keywords

  • Cumulative frequency distribution
  • Query estimation
  • Random sampling
  • Range query

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Wu, Y. L., Agrawal, D., & El Abbadi, A. (2001). Applying the golden rule of sampling for query estimation. In T. Sellis, & S. Mehrotra (Eds.), Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 449-460)

Applying the golden rule of sampling for query estimation. / Wu, Y. L.; Agrawal, D.; El Abbadi, A.

Proceedings of the ACM SIGMOD International Conference on Management of Data. ed. / T. Sellis; S. Mehrotra. 2001. p. 449-460.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wu, YL, Agrawal, D & El Abbadi, A 2001, Applying the golden rule of sampling for query estimation. in T Sellis & S Mehrotra (eds), Proceedings of the ACM SIGMOD International Conference on Management of Data. pp. 449-460, 2001 ACM SIGMOD International Conference on Management of Data, Santa Barbara, CA, United States, 21/5/01.
Wu YL, Agrawal D, El Abbadi A. Applying the golden rule of sampling for query estimation. In Sellis T, Mehrotra S, editors, Proceedings of the ACM SIGMOD International Conference on Management of Data. 2001. p. 449-460
Wu, Y. L. ; Agrawal, D. ; El Abbadi, A. / Applying the golden rule of sampling for query estimation. Proceedings of the ACM SIGMOD International Conference on Management of Data. editor / T. Sellis ; S. Mehrotra. 2001. pp. 449-460
@inproceedings{848b32c55af9443e829ca83113884ef6,
title = "Applying the golden rule of sampling for query estimation",
abstract = "Query size estimation is crucial for many database system components. In particular, query optimizers need efficient and accurate query size estimation when deciding among alternative query plans. In this paper we propose a novel sampling technique based on the golden rule of sampling, introduced by von Neumann in 1947, for estimating range queries. The proposed technique randomly samples the frequency domain using the cumulative frequency distribution and yields good estimates without any a priori knowledge of the actual underlying distribution of spatial objects. We show experimentally that the proposed sampling technique gives smaller approximation error than the Min-Skew histogram based and wavelet based approaches for both synthetic and real datasets. Moreover, the proposed technique can be easily extended for higher dimensional datasets.",
keywords = "Cumulative frequency distribution, Query estimation, Random sampling, Range query",
author = "Wu, {Y. L.} and D. Agrawal and {El Abbadi}, A.",
year = "2001",
language = "English",
pages = "449--460",
editor = "T. Sellis and S. Mehrotra",
booktitle = "Proceedings of the ACM SIGMOD International Conference on Management of Data",

}

TY - GEN

T1 - Applying the golden rule of sampling for query estimation

AU - Wu, Y. L.

AU - Agrawal, D.

AU - El Abbadi, A.

PY - 2001

Y1 - 2001

N2 - Query size estimation is crucial for many database system components. In particular, query optimizers need efficient and accurate query size estimation when deciding among alternative query plans. In this paper we propose a novel sampling technique based on the golden rule of sampling, introduced by von Neumann in 1947, for estimating range queries. The proposed technique randomly samples the frequency domain using the cumulative frequency distribution and yields good estimates without any a priori knowledge of the actual underlying distribution of spatial objects. We show experimentally that the proposed sampling technique gives smaller approximation error than the Min-Skew histogram based and wavelet based approaches for both synthetic and real datasets. Moreover, the proposed technique can be easily extended for higher dimensional datasets.

AB - Query size estimation is crucial for many database system components. In particular, query optimizers need efficient and accurate query size estimation when deciding among alternative query plans. In this paper we propose a novel sampling technique based on the golden rule of sampling, introduced by von Neumann in 1947, for estimating range queries. The proposed technique randomly samples the frequency domain using the cumulative frequency distribution and yields good estimates without any a priori knowledge of the actual underlying distribution of spatial objects. We show experimentally that the proposed sampling technique gives smaller approximation error than the Min-Skew histogram based and wavelet based approaches for both synthetic and real datasets. Moreover, the proposed technique can be easily extended for higher dimensional datasets.

KW - Cumulative frequency distribution

KW - Query estimation

KW - Random sampling

KW - Range query

UR - http://www.scopus.com/inward/record.url?scp=0034825250&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034825250&partnerID=8YFLogxK

M3 - Conference contribution

SP - 449

EP - 460

BT - Proceedings of the ACM SIGMOD International Conference on Management of Data

A2 - Sellis, T.

A2 - Mehrotra, S.

ER -