CerFix: A system for cleaning data with certain fixes

Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, Wenyuan Yu

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

We present CerFix, a data cleaning system that finds certain fixes for tuples at the point of data entry, i.e., fixes that are guaranteed correct. It is based on master data, editing rules and certain regions. Given some attributes of an input tuple that are validated (assured correct), editing rules tell us what other attributes to fix and how to correct them with master data. A certain region is a set of attributes that, if validated, warrant a certain fix for the entire tuple. We demonstrate the following facilities provided by CerFix: (1) a region finder to identify certain regions; (2) a data monitor to find certain fixes for input tuples, by guiding users to validate a minimal number of attributes; and (3) an auditing module to show what attributes are fixed and where the correct values come from.

Original languageEnglish
Pages (from-to)1375-1378
Number of pages4
JournalProceedings of the VLDB Endowment
Volume4
Issue number12
Publication statusPublished - 1 Aug 2011
Externally publishedYes

    Fingerprint

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Fan, W., Li, J., Ma, S., Tang, N., & Yu, W. (2011). CerFix: A system for cleaning data with certain fixes. Proceedings of the VLDB Endowment, 4(12), 1375-1378.