Discovery of genuine functional dependencies from relational data with missing values

Laure Berti-Equille, Hazar Harmouch, Felix Naumann, Noel Novelli, Saravanan Thirumuruganathan

Research output: Contribution to journalConference article

4 Citations (Scopus)

Abstract

Functional dependencies (FDs) play an important role in maintaining data quality. They can be used to enforce data consistency and to guide repairs over a database. In this work, we investigate the problem of missing values and its impact on FD discovery. When using existing FD discovery algorithms, some genuine FDs could not be detected precisely due to missing values or some non-genuine FDs can be discovered even though they are caused by missing values with a certain NULL semantics. We define a notion of genuineness and propose algorithms to compute the genuineness score of a discovered FD. This can be used to identify the genuine FDs among the set of all valid dependencies that hold on the data. We evaluate the quality of our method over various real-world and semi-synthetic datasets with extensive experiments. The results show that our method performs well for relatively large FD sets and is able to accurately capture genuine FDs.

Original languageEnglish
Pages (from-to)880-892
Number of pages13
JournalProceedings of the VLDB Endowment
Volume11
Issue number8
DOIs
Publication statusPublished - 1 Jan 2018
Event44th International Conference on Very Large Data Bases, VLDB 2018 - Rio de Janeiro, Brazil
Duration: 27 Aug 201731 Aug 2017

Fingerprint

Repair
Semantics
Experiments

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Discovery of genuine functional dependencies from relational data with missing values. / Berti-Equille, Laure; Harmouch, Hazar; Naumann, Felix; Novelli, Noel; Thirumuruganathan, Saravanan.

In: Proceedings of the VLDB Endowment, Vol. 11, No. 8, 01.01.2018, p. 880-892.

Research output: Contribution to journalConference article

@article{13e81a5cbe4540839d27a8d0c5a308a6,
title = "Discovery of genuine functional dependencies from relational data with missing values",
abstract = "Functional dependencies (FDs) play an important role in maintaining data quality. They can be used to enforce data consistency and to guide repairs over a database. In this work, we investigate the problem of missing values and its impact on FD discovery. When using existing FD discovery algorithms, some genuine FDs could not be detected precisely due to missing values or some non-genuine FDs can be discovered even though they are caused by missing values with a certain NULL semantics. We define a notion of genuineness and propose algorithms to compute the genuineness score of a discovered FD. This can be used to identify the genuine FDs among the set of all valid dependencies that hold on the data. We evaluate the quality of our method over various real-world and semi-synthetic datasets with extensive experiments. The results show that our method performs well for relatively large FD sets and is able to accurately capture genuine FDs.",
author = "Laure Berti-Equille and Hazar Harmouch and Felix Naumann and Noel Novelli and Saravanan Thirumuruganathan",
year = "2018",
month = "1",
day = "1",
doi = "10.14778/3204028.3204032",
language = "English",
volume = "11",
pages = "880--892",
journal = "Proceedings of the VLDB Endowment",
issn = "2150-8097",
publisher = "Very Large Data Base Endowment Inc.",
number = "8",

}

TY - JOUR

T1 - Discovery of genuine functional dependencies from relational data with missing values

AU - Berti-Equille, Laure

AU - Harmouch, Hazar

AU - Naumann, Felix

AU - Novelli, Noel

AU - Thirumuruganathan, Saravanan

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Functional dependencies (FDs) play an important role in maintaining data quality. They can be used to enforce data consistency and to guide repairs over a database. In this work, we investigate the problem of missing values and its impact on FD discovery. When using existing FD discovery algorithms, some genuine FDs could not be detected precisely due to missing values or some non-genuine FDs can be discovered even though they are caused by missing values with a certain NULL semantics. We define a notion of genuineness and propose algorithms to compute the genuineness score of a discovered FD. This can be used to identify the genuine FDs among the set of all valid dependencies that hold on the data. We evaluate the quality of our method over various real-world and semi-synthetic datasets with extensive experiments. The results show that our method performs well for relatively large FD sets and is able to accurately capture genuine FDs.

AB - Functional dependencies (FDs) play an important role in maintaining data quality. They can be used to enforce data consistency and to guide repairs over a database. In this work, we investigate the problem of missing values and its impact on FD discovery. When using existing FD discovery algorithms, some genuine FDs could not be detected precisely due to missing values or some non-genuine FDs can be discovered even though they are caused by missing values with a certain NULL semantics. We define a notion of genuineness and propose algorithms to compute the genuineness score of a discovered FD. This can be used to identify the genuine FDs among the set of all valid dependencies that hold on the data. We evaluate the quality of our method over various real-world and semi-synthetic datasets with extensive experiments. The results show that our method performs well for relatively large FD sets and is able to accurately capture genuine FDs.

UR - http://www.scopus.com/inward/record.url?scp=85063943131&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063943131&partnerID=8YFLogxK

U2 - 10.14778/3204028.3204032

DO - 10.14778/3204028.3204032

M3 - Conference article

VL - 11

SP - 880

EP - 892

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

SN - 2150-8097

IS - 8

ER -