Dependable data repairing with fixing rules

Jiannan Wang, Nan Tang

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

One of the main challenges that data-cleaning systems face is to automatically identify and repair data errors in a dependable manner. Though data dependencies (also known as integrity constraints) have been widely studied to capture errors in data, automated and dependable data repairing on these errors has remained a notoriously difficult problem. In this work, we introduce an automated approach for dependably repairing data errors, based on a novel class of fixing rules. A fixing rule contains an evidence pattern, a set of negative patterns, and a fact value. The heart of fixing rules is deterministic: given a tuple, the evidence pattern and the negative patterns of a fixing rule are combined to precisely capture which attribute is wrong, and the fact indicates how to correct this error. We study several fundamental problems associated with fixing rules and establish their complexity. We develop efficient algorithms to check whether a set of fixing rules are consistent and discuss approaches to resolve inconsistent fixing rules. We also devise efficient algorithms for repairing data errors using fixing rules. Moreover, we discuss approaches on how to generate a large number of fixing rules from examples or available knowledge bases. We experimentally demonstrate that our techniques outperform other automated algorithms in terms of the accuracy of repairing data errors, using both real-life and synthetic data.

Original languageEnglish
Article number16
JournalJournal of Data and Information Quality
Volume8
Issue number3-4
DOIs
Publication statusPublished - 1 Jun 2017

Fingerprint

Cleaning
Repair
Knowledge base
Data cleaning
Integrity

Keywords

  • Data repairing
  • Dependable
  • Fixing rules

ASJC Scopus subject areas

  • Information Systems
  • Information Systems and Management

Cite this

Dependable data repairing with fixing rules. / Wang, Jiannan; Tang, Nan.

In: Journal of Data and Information Quality, Vol. 8, No. 3-4, 16, 01.06.2017.

Research output: Contribution to journalArticle

@article{88b10f5f2c6549a6885892d5e83fc9f2,
title = "Dependable data repairing with fixing rules",
abstract = "One of the main challenges that data-cleaning systems face is to automatically identify and repair data errors in a dependable manner. Though data dependencies (also known as integrity constraints) have been widely studied to capture errors in data, automated and dependable data repairing on these errors has remained a notoriously difficult problem. In this work, we introduce an automated approach for dependably repairing data errors, based on a novel class of fixing rules. A fixing rule contains an evidence pattern, a set of negative patterns, and a fact value. The heart of fixing rules is deterministic: given a tuple, the evidence pattern and the negative patterns of a fixing rule are combined to precisely capture which attribute is wrong, and the fact indicates how to correct this error. We study several fundamental problems associated with fixing rules and establish their complexity. We develop efficient algorithms to check whether a set of fixing rules are consistent and discuss approaches to resolve inconsistent fixing rules. We also devise efficient algorithms for repairing data errors using fixing rules. Moreover, we discuss approaches on how to generate a large number of fixing rules from examples or available knowledge bases. We experimentally demonstrate that our techniques outperform other automated algorithms in terms of the accuracy of repairing data errors, using both real-life and synthetic data.",
keywords = "Data repairing, Dependable, Fixing rules",
author = "Jiannan Wang and Nan Tang",
year = "2017",
month = "6",
day = "1",
doi = "10.1145/3041761",
language = "English",
volume = "8",
journal = "Journal of Data and Information Quality",
issn = "1936-1955",
publisher = "Association for Computing Machinery (ACM)",
number = "3-4",

}

TY - JOUR

T1 - Dependable data repairing with fixing rules

AU - Wang, Jiannan

AU - Tang, Nan

PY - 2017/6/1

Y1 - 2017/6/1

N2 - One of the main challenges that data-cleaning systems face is to automatically identify and repair data errors in a dependable manner. Though data dependencies (also known as integrity constraints) have been widely studied to capture errors in data, automated and dependable data repairing on these errors has remained a notoriously difficult problem. In this work, we introduce an automated approach for dependably repairing data errors, based on a novel class of fixing rules. A fixing rule contains an evidence pattern, a set of negative patterns, and a fact value. The heart of fixing rules is deterministic: given a tuple, the evidence pattern and the negative patterns of a fixing rule are combined to precisely capture which attribute is wrong, and the fact indicates how to correct this error. We study several fundamental problems associated with fixing rules and establish their complexity. We develop efficient algorithms to check whether a set of fixing rules are consistent and discuss approaches to resolve inconsistent fixing rules. We also devise efficient algorithms for repairing data errors using fixing rules. Moreover, we discuss approaches on how to generate a large number of fixing rules from examples or available knowledge bases. We experimentally demonstrate that our techniques outperform other automated algorithms in terms of the accuracy of repairing data errors, using both real-life and synthetic data.

AB - One of the main challenges that data-cleaning systems face is to automatically identify and repair data errors in a dependable manner. Though data dependencies (also known as integrity constraints) have been widely studied to capture errors in data, automated and dependable data repairing on these errors has remained a notoriously difficult problem. In this work, we introduce an automated approach for dependably repairing data errors, based on a novel class of fixing rules. A fixing rule contains an evidence pattern, a set of negative patterns, and a fact value. The heart of fixing rules is deterministic: given a tuple, the evidence pattern and the negative patterns of a fixing rule are combined to precisely capture which attribute is wrong, and the fact indicates how to correct this error. We study several fundamental problems associated with fixing rules and establish their complexity. We develop efficient algorithms to check whether a set of fixing rules are consistent and discuss approaches to resolve inconsistent fixing rules. We also devise efficient algorithms for repairing data errors using fixing rules. Moreover, we discuss approaches on how to generate a large number of fixing rules from examples or available knowledge bases. We experimentally demonstrate that our techniques outperform other automated algorithms in terms of the accuracy of repairing data errors, using both real-life and synthetic data.

KW - Data repairing

KW - Dependable

KW - Fixing rules

UR - http://www.scopus.com/inward/record.url?scp=85024093554&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85024093554&partnerID=8YFLogxK

U2 - 10.1145/3041761

DO - 10.1145/3041761

M3 - Article

VL - 8

JO - Journal of Data and Information Quality

JF - Journal of Data and Information Quality

SN - 1936-1955

IS - 3-4

M1 - 16

ER -