Towards certain fixes with editing rules and master data

Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, Wenyuan Yu

Research output: Contribution to journalArticle

66 Citations (Scopus)

Abstract

A variety of integrity constraints have been studied for data cleaning. While these constraints can detect the presence of errors, they fall short of guiding us to correct the errors. Indeed, data repairing based on these constraints may not find certain fixes that are guaranteed correct, and worse still, may even introduce new errors when attempting to repair the data. We propose a method for finding certain fixes, based on master data, a notion of certain regions, and a class of editing rules. A certain region is a set of attributes that are assured correct by the users. Given a certain region and master data, editing rules tell us what attributes to fix and how to update them. We show how the method can be used in data monitoring and enrichment. We also develop techniques for reasoning about editing rules, to decide whether they lead to a unique fix and whether they are able to fix all the attributes in a tuple, relative to master data and a certain region. Furthermore, we present a framework and an algorithm to find certain fixes, by interacting with the users to ensure that one of the certain regions is correct. We experimentally verify the effectiveness and scalability of the algorithm.

Original languageEnglish
Pages (from-to)213-238
Number of pages26
JournalVLDB Journal
Volume21
Issue number2
DOIs
Publication statusPublished - 1 Apr 2012
Externally publishedYes

Fingerprint

Scalability
Cleaning
Repair
Monitoring

Keywords

  • Certain fix
  • Data cleaning
  • Data quality
  • Editing rule
  • Master data

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems

Cite this

Towards certain fixes with editing rules and master data. / Fan, Wenfei; Li, Jianzhong; Ma, Shuai; Tang, Nan; Yu, Wenyuan.

In: VLDB Journal, Vol. 21, No. 2, 01.04.2012, p. 213-238.

Research output: Contribution to journalArticle

Fan, Wenfei ; Li, Jianzhong ; Ma, Shuai ; Tang, Nan ; Yu, Wenyuan. / Towards certain fixes with editing rules and master data. In: VLDB Journal. 2012 ; Vol. 21, No. 2. pp. 213-238.
@article{dc16d08f911846679f009fa1ca6a43da,
title = "Towards certain fixes with editing rules and master data",
abstract = "A variety of integrity constraints have been studied for data cleaning. While these constraints can detect the presence of errors, they fall short of guiding us to correct the errors. Indeed, data repairing based on these constraints may not find certain fixes that are guaranteed correct, and worse still, may even introduce new errors when attempting to repair the data. We propose a method for finding certain fixes, based on master data, a notion of certain regions, and a class of editing rules. A certain region is a set of attributes that are assured correct by the users. Given a certain region and master data, editing rules tell us what attributes to fix and how to update them. We show how the method can be used in data monitoring and enrichment. We also develop techniques for reasoning about editing rules, to decide whether they lead to a unique fix and whether they are able to fix all the attributes in a tuple, relative to master data and a certain region. Furthermore, we present a framework and an algorithm to find certain fixes, by interacting with the users to ensure that one of the certain regions is correct. We experimentally verify the effectiveness and scalability of the algorithm.",
keywords = "Certain fix, Data cleaning, Data quality, Editing rule, Master data",
author = "Wenfei Fan and Jianzhong Li and Shuai Ma and Nan Tang and Wenyuan Yu",
year = "2012",
month = "4",
day = "1",
doi = "10.1007/s00778-011-0253-7",
language = "English",
volume = "21",
pages = "213--238",
journal = "VLDB Journal",
issn = "1066-8888",
publisher = "Springer New York",
number = "2",

}

TY - JOUR

T1 - Towards certain fixes with editing rules and master data

AU - Fan, Wenfei

AU - Li, Jianzhong

AU - Ma, Shuai

AU - Tang, Nan

AU - Yu, Wenyuan

PY - 2012/4/1

Y1 - 2012/4/1

N2 - A variety of integrity constraints have been studied for data cleaning. While these constraints can detect the presence of errors, they fall short of guiding us to correct the errors. Indeed, data repairing based on these constraints may not find certain fixes that are guaranteed correct, and worse still, may even introduce new errors when attempting to repair the data. We propose a method for finding certain fixes, based on master data, a notion of certain regions, and a class of editing rules. A certain region is a set of attributes that are assured correct by the users. Given a certain region and master data, editing rules tell us what attributes to fix and how to update them. We show how the method can be used in data monitoring and enrichment. We also develop techniques for reasoning about editing rules, to decide whether they lead to a unique fix and whether they are able to fix all the attributes in a tuple, relative to master data and a certain region. Furthermore, we present a framework and an algorithm to find certain fixes, by interacting with the users to ensure that one of the certain regions is correct. We experimentally verify the effectiveness and scalability of the algorithm.

AB - A variety of integrity constraints have been studied for data cleaning. While these constraints can detect the presence of errors, they fall short of guiding us to correct the errors. Indeed, data repairing based on these constraints may not find certain fixes that are guaranteed correct, and worse still, may even introduce new errors when attempting to repair the data. We propose a method for finding certain fixes, based on master data, a notion of certain regions, and a class of editing rules. A certain region is a set of attributes that are assured correct by the users. Given a certain region and master data, editing rules tell us what attributes to fix and how to update them. We show how the method can be used in data monitoring and enrichment. We also develop techniques for reasoning about editing rules, to decide whether they lead to a unique fix and whether they are able to fix all the attributes in a tuple, relative to master data and a certain region. Furthermore, we present a framework and an algorithm to find certain fixes, by interacting with the users to ensure that one of the certain regions is correct. We experimentally verify the effectiveness and scalability of the algorithm.

KW - Certain fix

KW - Data cleaning

KW - Data quality

KW - Editing rule

KW - Master data

UR - http://www.scopus.com/inward/record.url?scp=84858614433&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84858614433&partnerID=8YFLogxK

U2 - 10.1007/s00778-011-0253-7

DO - 10.1007/s00778-011-0253-7

M3 - Article

VL - 21

SP - 213

EP - 238

JO - VLDB Journal

JF - VLDB Journal

SN - 1066-8888

IS - 2

ER -