Towards dependable data repairing with fixing rules

Jiannan Wang, Nan Tang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

51 Citations (Scopus)

Abstract

One of the main challenges that data cleaning systems face is to automatically identify and repair data errors in a dependable manner. Though data dependencies (a.k.a. integrity constraints) have been widely studied to capture errors in data, automated and dependable data repairing on these errors has remained a notoriously hard problem. In this work, we introduce an automated approach for dependably repairing data errors, based on a novel class of fixing rules. A fixing rule contains an evidence pattern, a set of negative patterns, and a fact value. The heart of fixing rules is deterministic: given a tuple, the evidence pattern and the negative patterns of a fixing rule are combined to precisely capture which attribute is wrong, and the fact indicates how to correct this error. We study several fundamental problems associated with fixing rules, and establish their complexity. We develop efficient algorithms to check whether a set of fixing rules is consistent, and discuss approaches to resolve inconsistent fixing rules. We also devise efficient algorithms for repairing data errors using fixing rules. We experimentally demonstrate that our techniques outperform other automated algorithms in terms of the accuracy of repairing data errors, using both real-life and synthetic data.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGMOD International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages457-468
Number of pages12
ISBN (Print)9781450323765
DOIs
Publication statusPublished - 1 Jan 2014
Event2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014 - Snowbird, UT, United States
Duration: 22 Jun 201427 Jun 2014

Other

Other2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014
CountryUnited States
CitySnowbird, UT
Period22/6/1427/6/14

Fingerprint

Cleaning
Repair

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Wang, J., & Tang, N. (2014). Towards dependable data repairing with fixing rules. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 457-468). Association for Computing Machinery. https://doi.org/10.1145/2588555.2610494

Towards dependable data repairing with fixing rules. / Wang, Jiannan; Tang, Nan.

Proceedings of the ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, 2014. p. 457-468.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, J & Tang, N 2014, Towards dependable data repairing with fixing rules. in Proceedings of the ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, pp. 457-468, 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, United States, 22/6/14. https://doi.org/10.1145/2588555.2610494
Wang J, Tang N. Towards dependable data repairing with fixing rules. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery. 2014. p. 457-468 https://doi.org/10.1145/2588555.2610494
Wang, Jiannan ; Tang, Nan. / Towards dependable data repairing with fixing rules. Proceedings of the ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, 2014. pp. 457-468
@inproceedings{95a6f13f8fe24f21a999870a415b02ff,
title = "Towards dependable data repairing with fixing rules",
abstract = "One of the main challenges that data cleaning systems face is to automatically identify and repair data errors in a dependable manner. Though data dependencies (a.k.a. integrity constraints) have been widely studied to capture errors in data, automated and dependable data repairing on these errors has remained a notoriously hard problem. In this work, we introduce an automated approach for dependably repairing data errors, based on a novel class of fixing rules. A fixing rule contains an evidence pattern, a set of negative patterns, and a fact value. The heart of fixing rules is deterministic: given a tuple, the evidence pattern and the negative patterns of a fixing rule are combined to precisely capture which attribute is wrong, and the fact indicates how to correct this error. We study several fundamental problems associated with fixing rules, and establish their complexity. We develop efficient algorithms to check whether a set of fixing rules is consistent, and discuss approaches to resolve inconsistent fixing rules. We also devise efficient algorithms for repairing data errors using fixing rules. We experimentally demonstrate that our techniques outperform other automated algorithms in terms of the accuracy of repairing data errors, using both real-life and synthetic data.",
author = "Jiannan Wang and Nan Tang",
year = "2014",
month = "1",
day = "1",
doi = "10.1145/2588555.2610494",
language = "English",
isbn = "9781450323765",
pages = "457--468",
booktitle = "Proceedings of the ACM SIGMOD International Conference on Management of Data",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - Towards dependable data repairing with fixing rules

AU - Wang, Jiannan

AU - Tang, Nan

PY - 2014/1/1

Y1 - 2014/1/1

N2 - One of the main challenges that data cleaning systems face is to automatically identify and repair data errors in a dependable manner. Though data dependencies (a.k.a. integrity constraints) have been widely studied to capture errors in data, automated and dependable data repairing on these errors has remained a notoriously hard problem. In this work, we introduce an automated approach for dependably repairing data errors, based on a novel class of fixing rules. A fixing rule contains an evidence pattern, a set of negative patterns, and a fact value. The heart of fixing rules is deterministic: given a tuple, the evidence pattern and the negative patterns of a fixing rule are combined to precisely capture which attribute is wrong, and the fact indicates how to correct this error. We study several fundamental problems associated with fixing rules, and establish their complexity. We develop efficient algorithms to check whether a set of fixing rules is consistent, and discuss approaches to resolve inconsistent fixing rules. We also devise efficient algorithms for repairing data errors using fixing rules. We experimentally demonstrate that our techniques outperform other automated algorithms in terms of the accuracy of repairing data errors, using both real-life and synthetic data.

AB - One of the main challenges that data cleaning systems face is to automatically identify and repair data errors in a dependable manner. Though data dependencies (a.k.a. integrity constraints) have been widely studied to capture errors in data, automated and dependable data repairing on these errors has remained a notoriously hard problem. In this work, we introduce an automated approach for dependably repairing data errors, based on a novel class of fixing rules. A fixing rule contains an evidence pattern, a set of negative patterns, and a fact value. The heart of fixing rules is deterministic: given a tuple, the evidence pattern and the negative patterns of a fixing rule are combined to precisely capture which attribute is wrong, and the fact indicates how to correct this error. We study several fundamental problems associated with fixing rules, and establish their complexity. We develop efficient algorithms to check whether a set of fixing rules is consistent, and discuss approaches to resolve inconsistent fixing rules. We also devise efficient algorithms for repairing data errors using fixing rules. We experimentally demonstrate that our techniques outperform other automated algorithms in terms of the accuracy of repairing data errors, using both real-life and synthetic data.

UR - http://www.scopus.com/inward/record.url?scp=84904293819&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84904293819&partnerID=8YFLogxK

U2 - 10.1145/2588555.2610494

DO - 10.1145/2588555.2610494

M3 - Conference contribution

SN - 9781450323765

SP - 457

EP - 468

BT - Proceedings of the ACM SIGMOD International Conference on Management of Data

PB - Association for Computing Machinery

ER -