CORDS: Automatic discovery of correlations and soft functional dependencies

Ihab F. Ilyas, Volker Markl, Peter Haas, Paul Brown, Ashraf Aboulnaga

Research output: Chapter in Book/Report/Conference proceedingConference contribution

173 Citations (Scopus)

Abstract

The rich dependency structure found in the columns of real-world relational databases can be exploited to great advantage, but can also cause query optimizers - which usually assume that columns are statistically independent - to underestimate the selectivities of conjunctive predicates by orders of magnitude. We introduce CORDS, an efficient and scalable tool for automatic discovery of correlations and soft functional dependencies between columns. CORDS searches for column pairs that might have interesting and useful dependency relations by systematically enumerating candidate pairs and simultaneously pruning unpromising candidates using a flexible set of heuristics. A robust chi-squared analysis is applied to a sample of column values in order to identify correlations, and the number of distinct values in the sampled columns is analyzed to detect soft functional dependencies. CORDS can be used as a data mining tool, producing dependency graphs that are of intrinsic interest. We focus primarily on the use of CORDS in query optimization. Specifically, CORDS recommends groups of columns on which to maintain certain simple joint statistics. These

Original languageEnglish
Title of host publicationProceedings of the ACM SIGMOD International Conference on Management of Data
EditorsG. Weikum, A.C. Konig, S. Dessloch
Pages647-658
Number of pages12
Publication statusPublished - 2004
Externally publishedYes
EventProceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2004 - Paris, France
Duration: 13 Jun 200418 Jun 2004

Other

OtherProceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2004
CountryFrance
CityParis
Period13/6/0418/6/04

Fingerprint

Data mining
Statistics

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Ilyas, I. F., Markl, V., Haas, P., Brown, P., & Aboulnaga, A. (2004). CORDS: Automatic discovery of correlations and soft functional dependencies. In G. Weikum, A. C. Konig, & S. Dessloch (Eds.), Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 647-658)

CORDS : Automatic discovery of correlations and soft functional dependencies. / Ilyas, Ihab F.; Markl, Volker; Haas, Peter; Brown, Paul; Aboulnaga, Ashraf.

Proceedings of the ACM SIGMOD International Conference on Management of Data. ed. / G. Weikum; A.C. Konig; S. Dessloch. 2004. p. 647-658.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ilyas, IF, Markl, V, Haas, P, Brown, P & Aboulnaga, A 2004, CORDS: Automatic discovery of correlations and soft functional dependencies. in G Weikum, AC Konig & S Dessloch (eds), Proceedings of the ACM SIGMOD International Conference on Management of Data. pp. 647-658, Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2004, Paris, France, 13/6/04.
Ilyas IF, Markl V, Haas P, Brown P, Aboulnaga A. CORDS: Automatic discovery of correlations and soft functional dependencies. In Weikum G, Konig AC, Dessloch S, editors, Proceedings of the ACM SIGMOD International Conference on Management of Data. 2004. p. 647-658
Ilyas, Ihab F. ; Markl, Volker ; Haas, Peter ; Brown, Paul ; Aboulnaga, Ashraf. / CORDS : Automatic discovery of correlations and soft functional dependencies. Proceedings of the ACM SIGMOD International Conference on Management of Data. editor / G. Weikum ; A.C. Konig ; S. Dessloch. 2004. pp. 647-658
@inproceedings{06aa797d693344d8a85d71f00706d4ad,
title = "CORDS: Automatic discovery of correlations and soft functional dependencies",
abstract = "The rich dependency structure found in the columns of real-world relational databases can be exploited to great advantage, but can also cause query optimizers - which usually assume that columns are statistically independent - to underestimate the selectivities of conjunctive predicates by orders of magnitude. We introduce CORDS, an efficient and scalable tool for automatic discovery of correlations and soft functional dependencies between columns. CORDS searches for column pairs that might have interesting and useful dependency relations by systematically enumerating candidate pairs and simultaneously pruning unpromising candidates using a flexible set of heuristics. A robust chi-squared analysis is applied to a sample of column values in order to identify correlations, and the number of distinct values in the sampled columns is analyzed to detect soft functional dependencies. CORDS can be used as a data mining tool, producing dependency graphs that are of intrinsic interest. We focus primarily on the use of CORDS in query optimization. Specifically, CORDS recommends groups of columns on which to maintain certain simple joint statistics. These",
author = "Ilyas, {Ihab F.} and Volker Markl and Peter Haas and Paul Brown and Ashraf Aboulnaga",
year = "2004",
language = "English",
pages = "647--658",
editor = "G. Weikum and A.C. Konig and S. Dessloch",
booktitle = "Proceedings of the ACM SIGMOD International Conference on Management of Data",

}

TY - GEN

T1 - CORDS

T2 - Automatic discovery of correlations and soft functional dependencies

AU - Ilyas, Ihab F.

AU - Markl, Volker

AU - Haas, Peter

AU - Brown, Paul

AU - Aboulnaga, Ashraf

PY - 2004

Y1 - 2004

N2 - The rich dependency structure found in the columns of real-world relational databases can be exploited to great advantage, but can also cause query optimizers - which usually assume that columns are statistically independent - to underestimate the selectivities of conjunctive predicates by orders of magnitude. We introduce CORDS, an efficient and scalable tool for automatic discovery of correlations and soft functional dependencies between columns. CORDS searches for column pairs that might have interesting and useful dependency relations by systematically enumerating candidate pairs and simultaneously pruning unpromising candidates using a flexible set of heuristics. A robust chi-squared analysis is applied to a sample of column values in order to identify correlations, and the number of distinct values in the sampled columns is analyzed to detect soft functional dependencies. CORDS can be used as a data mining tool, producing dependency graphs that are of intrinsic interest. We focus primarily on the use of CORDS in query optimization. Specifically, CORDS recommends groups of columns on which to maintain certain simple joint statistics. These

AB - The rich dependency structure found in the columns of real-world relational databases can be exploited to great advantage, but can also cause query optimizers - which usually assume that columns are statistically independent - to underestimate the selectivities of conjunctive predicates by orders of magnitude. We introduce CORDS, an efficient and scalable tool for automatic discovery of correlations and soft functional dependencies between columns. CORDS searches for column pairs that might have interesting and useful dependency relations by systematically enumerating candidate pairs and simultaneously pruning unpromising candidates using a flexible set of heuristics. A robust chi-squared analysis is applied to a sample of column values in order to identify correlations, and the number of distinct values in the sampled columns is analyzed to detect soft functional dependencies. CORDS can be used as a data mining tool, producing dependency graphs that are of intrinsic interest. We focus primarily on the use of CORDS in query optimization. Specifically, CORDS recommends groups of columns on which to maintain certain simple joint statistics. These

UR - http://www.scopus.com/inward/record.url?scp=3142708793&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=3142708793&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:3142708793

SP - 647

EP - 658

BT - Proceedings of the ACM SIGMOD International Conference on Management of Data

A2 - Weikum, G.

A2 - Konig, A.C.

A2 - Dessloch, S.

ER -