Kernel-Based skyline cardinality estimation

Zhenjie Zhang, Yin Yang, Ruichu Cai, Dimitris Papadias, Anthony Tung

Research output: Chapter in Book/Report/Conference proceedingConference contribution

31 Citations (Scopus)

Abstract

The skyline of a d-dimensional dataset consists of all points not dominated by others. The incorporation of the skyline operator into practical database systems necessitates an efficient and effective cardinality estimation module. However, existing theoretical work on this problem is limited to the case where all d dimensions are independent of each other, which rarely holds for real datasets. The state of the art Log Sampling (LS) technique simply applies theoretical results for independent dimensions to non-independent data anyway, sometimes leading to large estimation errors. To solve this problem, we propose a novel Kernel-Based (KB) approach that approximates the skyline cardinality with nonparametric methods. Extensive experiments with various real datasets demonstrate that KB achieves high accuracy, even in cases where LS fails. At the same time, despite its numerical nature, the efficiency of KB is comparable to that of LS. Furthermore, we extend both LS and KB to the k-dominant skyline, which is commonly used instead of the conventional skyline for high-dimensional data.

Original languageEnglish
Title of host publicationSIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems
Pages509-521
Number of pages13
DOIs
Publication statusPublished - 2009
Externally publishedYes
EventInternational Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09 - Providence, RI, United States
Duration: 29 Jun 20092 Jul 2009

Other

OtherInternational Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09
CountryUnited States
CityProvidence, RI
Period29/6/092/7/09

Fingerprint

Sampling
Error analysis
Experiments

Keywords

  • Cardinality estimation
  • Kernel
  • Non-parametric methods
  • Skyline

ASJC Scopus subject areas

  • Software

Cite this

Zhang, Z., Yang, Y., Cai, R., Papadias, D., & Tung, A. (2009). Kernel-Based skyline cardinality estimation. In SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems (pp. 509-521) https://doi.org/10.1145/1559845.1559899

Kernel-Based skyline cardinality estimation. / Zhang, Zhenjie; Yang, Yin; Cai, Ruichu; Papadias, Dimitris; Tung, Anthony.

SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems. 2009. p. 509-521.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhang, Z, Yang, Y, Cai, R, Papadias, D & Tung, A 2009, Kernel-Based skyline cardinality estimation. in SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems. pp. 509-521, International Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09, Providence, RI, United States, 29/6/09. https://doi.org/10.1145/1559845.1559899
Zhang Z, Yang Y, Cai R, Papadias D, Tung A. Kernel-Based skyline cardinality estimation. In SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems. 2009. p. 509-521 https://doi.org/10.1145/1559845.1559899
Zhang, Zhenjie ; Yang, Yin ; Cai, Ruichu ; Papadias, Dimitris ; Tung, Anthony. / Kernel-Based skyline cardinality estimation. SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems. 2009. pp. 509-521
@inproceedings{bbdd14f4c2684a7997f4617ca708300e,
title = "Kernel-Based skyline cardinality estimation",
abstract = "The skyline of a d-dimensional dataset consists of all points not dominated by others. The incorporation of the skyline operator into practical database systems necessitates an efficient and effective cardinality estimation module. However, existing theoretical work on this problem is limited to the case where all d dimensions are independent of each other, which rarely holds for real datasets. The state of the art Log Sampling (LS) technique simply applies theoretical results for independent dimensions to non-independent data anyway, sometimes leading to large estimation errors. To solve this problem, we propose a novel Kernel-Based (KB) approach that approximates the skyline cardinality with nonparametric methods. Extensive experiments with various real datasets demonstrate that KB achieves high accuracy, even in cases where LS fails. At the same time, despite its numerical nature, the efficiency of KB is comparable to that of LS. Furthermore, we extend both LS and KB to the k-dominant skyline, which is commonly used instead of the conventional skyline for high-dimensional data.",
keywords = "Cardinality estimation, Kernel, Non-parametric methods, Skyline",
author = "Zhenjie Zhang and Yin Yang and Ruichu Cai and Dimitris Papadias and Anthony Tung",
year = "2009",
doi = "10.1145/1559845.1559899",
language = "English",
isbn = "9781605585543",
pages = "509--521",
booktitle = "SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems",

}

TY - GEN

T1 - Kernel-Based skyline cardinality estimation

AU - Zhang, Zhenjie

AU - Yang, Yin

AU - Cai, Ruichu

AU - Papadias, Dimitris

AU - Tung, Anthony

PY - 2009

Y1 - 2009

N2 - The skyline of a d-dimensional dataset consists of all points not dominated by others. The incorporation of the skyline operator into practical database systems necessitates an efficient and effective cardinality estimation module. However, existing theoretical work on this problem is limited to the case where all d dimensions are independent of each other, which rarely holds for real datasets. The state of the art Log Sampling (LS) technique simply applies theoretical results for independent dimensions to non-independent data anyway, sometimes leading to large estimation errors. To solve this problem, we propose a novel Kernel-Based (KB) approach that approximates the skyline cardinality with nonparametric methods. Extensive experiments with various real datasets demonstrate that KB achieves high accuracy, even in cases where LS fails. At the same time, despite its numerical nature, the efficiency of KB is comparable to that of LS. Furthermore, we extend both LS and KB to the k-dominant skyline, which is commonly used instead of the conventional skyline for high-dimensional data.

AB - The skyline of a d-dimensional dataset consists of all points not dominated by others. The incorporation of the skyline operator into practical database systems necessitates an efficient and effective cardinality estimation module. However, existing theoretical work on this problem is limited to the case where all d dimensions are independent of each other, which rarely holds for real datasets. The state of the art Log Sampling (LS) technique simply applies theoretical results for independent dimensions to non-independent data anyway, sometimes leading to large estimation errors. To solve this problem, we propose a novel Kernel-Based (KB) approach that approximates the skyline cardinality with nonparametric methods. Extensive experiments with various real datasets demonstrate that KB achieves high accuracy, even in cases where LS fails. At the same time, despite its numerical nature, the efficiency of KB is comparable to that of LS. Furthermore, we extend both LS and KB to the k-dominant skyline, which is commonly used instead of the conventional skyline for high-dimensional data.

KW - Cardinality estimation

KW - Kernel

KW - Non-parametric methods

KW - Skyline

UR - http://www.scopus.com/inward/record.url?scp=70849094969&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70849094969&partnerID=8YFLogxK

U2 - 10.1145/1559845.1559899

DO - 10.1145/1559845.1559899

M3 - Conference contribution

SN - 9781605585543

SP - 509

EP - 521

BT - SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems

ER -