Towards certain fixes with editing rules and master data

Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, Wenyuan Yu

Research output: Chapter in Book/Report/Conference proceedingChapter

77 Citations (Scopus)

Abstract

A variety of integrity constraints have been studied for data cleaning. While these constraints can detect the presence of errors, they fall short of guiding us to correct the errors. Indeed, data repairing based on these constraints may not find certain fixes that are absolutely correct, and worse, may introduce new errors when repairing the data. We propose a method for finding certain fixes, based on master data, a notion of certain regions, and a class of editing rules. A certain region is a set of attributes that are assured correct by the users. Given a certain region and master data, editing rules tell us what attributes to fix and how to update them. We show how the method can be used in data monitoring and enrichment. We develop techniques for reasoning about editing rules, to decide whether they lead to a unique fix and whether they are able to fix all the attributes in a tuple, relative to master data and a certain region. We also provide an algorithm to identify minimal certain regions, such that a certain fix is warranted by editing rules and master data as long as one of the regions is correct. We experimentally verify the effectiveness and scalability of the algorithm.

Original languageEnglish
Title of host publicationProceedings of the VLDB Endowment
Pages173-184
Number of pages12
Volume3
Edition1
Publication statusPublished - Sep 2010
Externally publishedYes

Fingerprint

Scalability
Cleaning
Monitoring

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Fan, W., Li, J., Ma, S., Tang, N., & Yu, W. (2010). Towards certain fixes with editing rules and master data. In Proceedings of the VLDB Endowment (1 ed., Vol. 3, pp. 173-184)

Towards certain fixes with editing rules and master data. / Fan, Wenfei; Li, Jianzhong; Ma, Shuai; Tang, Nan; Yu, Wenyuan.

Proceedings of the VLDB Endowment. Vol. 3 1. ed. 2010. p. 173-184.

Research output: Chapter in Book/Report/Conference proceedingChapter

Fan, W, Li, J, Ma, S, Tang, N & Yu, W 2010, Towards certain fixes with editing rules and master data. in Proceedings of the VLDB Endowment. 1 edn, vol. 3, pp. 173-184.
Fan W, Li J, Ma S, Tang N, Yu W. Towards certain fixes with editing rules and master data. In Proceedings of the VLDB Endowment. 1 ed. Vol. 3. 2010. p. 173-184
Fan, Wenfei ; Li, Jianzhong ; Ma, Shuai ; Tang, Nan ; Yu, Wenyuan. / Towards certain fixes with editing rules and master data. Proceedings of the VLDB Endowment. Vol. 3 1. ed. 2010. pp. 173-184
@inbook{bb5c58d5f9a644cfaa30f701c7639e3d,
title = "Towards certain fixes with editing rules and master data",
abstract = "A variety of integrity constraints have been studied for data cleaning. While these constraints can detect the presence of errors, they fall short of guiding us to correct the errors. Indeed, data repairing based on these constraints may not find certain fixes that are absolutely correct, and worse, may introduce new errors when repairing the data. We propose a method for finding certain fixes, based on master data, a notion of certain regions, and a class of editing rules. A certain region is a set of attributes that are assured correct by the users. Given a certain region and master data, editing rules tell us what attributes to fix and how to update them. We show how the method can be used in data monitoring and enrichment. We develop techniques for reasoning about editing rules, to decide whether they lead to a unique fix and whether they are able to fix all the attributes in a tuple, relative to master data and a certain region. We also provide an algorithm to identify minimal certain regions, such that a certain fix is warranted by editing rules and master data as long as one of the regions is correct. We experimentally verify the effectiveness and scalability of the algorithm.",
author = "Wenfei Fan and Jianzhong Li and Shuai Ma and Nan Tang and Wenyuan Yu",
year = "2010",
month = "9",
language = "English",
volume = "3",
pages = "173--184",
booktitle = "Proceedings of the VLDB Endowment",
edition = "1",

}

TY - CHAP

T1 - Towards certain fixes with editing rules and master data

AU - Fan, Wenfei

AU - Li, Jianzhong

AU - Ma, Shuai

AU - Tang, Nan

AU - Yu, Wenyuan

PY - 2010/9

Y1 - 2010/9

N2 - A variety of integrity constraints have been studied for data cleaning. While these constraints can detect the presence of errors, they fall short of guiding us to correct the errors. Indeed, data repairing based on these constraints may not find certain fixes that are absolutely correct, and worse, may introduce new errors when repairing the data. We propose a method for finding certain fixes, based on master data, a notion of certain regions, and a class of editing rules. A certain region is a set of attributes that are assured correct by the users. Given a certain region and master data, editing rules tell us what attributes to fix and how to update them. We show how the method can be used in data monitoring and enrichment. We develop techniques for reasoning about editing rules, to decide whether they lead to a unique fix and whether they are able to fix all the attributes in a tuple, relative to master data and a certain region. We also provide an algorithm to identify minimal certain regions, such that a certain fix is warranted by editing rules and master data as long as one of the regions is correct. We experimentally verify the effectiveness and scalability of the algorithm.

AB - A variety of integrity constraints have been studied for data cleaning. While these constraints can detect the presence of errors, they fall short of guiding us to correct the errors. Indeed, data repairing based on these constraints may not find certain fixes that are absolutely correct, and worse, may introduce new errors when repairing the data. We propose a method for finding certain fixes, based on master data, a notion of certain regions, and a class of editing rules. A certain region is a set of attributes that are assured correct by the users. Given a certain region and master data, editing rules tell us what attributes to fix and how to update them. We show how the method can be used in data monitoring and enrichment. We develop techniques for reasoning about editing rules, to decide whether they lead to a unique fix and whether they are able to fix all the attributes in a tuple, relative to master data and a certain region. We also provide an algorithm to identify minimal certain regions, such that a certain fix is warranted by editing rules and master data as long as one of the regions is correct. We experimentally verify the effectiveness and scalability of the algorithm.

UR - http://www.scopus.com/inward/record.url?scp=84858615261&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84858615261&partnerID=8YFLogxK

M3 - Chapter

AN - SCOPUS:84858615261

VL - 3

SP - 173

EP - 184

BT - Proceedings of the VLDB Endowment

ER -