CORDS: Automatic discovery of correlations and soft functional dependencies

Ihab F. Ilyas, Volker Markl, Peter Haas, Paul Brown, Ashraf Aboulnaga

Research output: Contribution to journalConference article

175 Citations (Scopus)

Abstract

The rich dependency structure found in the columns of real-world relational databases can be exploited to great advantage, but can also cause query optimizers - which usually assume that columns are statistically independent - to underestimate the selectivities of conjunctive predicates by orders of magnitude. We introduce CORDS, an efficient and scalable tool for automatic discovery of correlations and soft functional dependencies between columns. CORDS searches for column pairs that might have interesting and useful dependency relations by systematically enumerating candidate pairs and simultaneously pruning unpromising candidates using a flexible set of heuristics. A robust chi-squared analysis is applied to a sample of column values in order to identify correlations, and the number of distinct values in the sampled columns is analyzed to detect soft functional dependencies. CORDS can be used as a data mining tool, producing dependency graphs that are of intrinsic interest. We focus primarily on the use of CORDS in query optimization. Specifically, CORDS recommends groups of columns on which to maintain certain simple joint statistics. These

Original languageEnglish
Pages (from-to)647-658
Number of pages12
JournalProceedings of the ACM SIGMOD International Conference on Management of Data
Publication statusPublished - 27 Jul 2004
EventProceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2004 - Paris, France
Duration: 13 Jun 200418 Jun 2004

    Fingerprint

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this