Cleaning relations using knowledge bases

Shuang Hao, Nan Tang, Guoliang Li, Jian Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

We study the data cleaning problem of detecting and repairing wrong relational data, as well as marking correct data, using well curated knowledge bases (KBs). We propose detective rules (DRs), a new type of data cleaning rules that can make actionable decisions on relational data, by building connections between a relation and a KB. The main invention is that, a DR simultaneously models two opposite semantics of a relation using types and relationships in a KB: The positive semantics that explains how attribute values are linked to each other in correct tuples, and the negative semantics that indicates how wrong attribute values are connected to other correct attribute values within the same tuples. Naturally, a DR can mark correct values in a tuple if it matches the positive semantics. Meanwhile, a DR can detect/repair an error if it matches the negative semantics. We study fundamental problems associated with DRs, e.g., rule generation and rule consistency.We present efficient algorithms to apply DRs to clean a relation, based on rule order selection and inverted indexes. Extensive experiments, using both real-world and synthetic datasets, verify the effectiveness and efficiency of applying DRs in practice.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017
PublisherIEEE Computer Society
Pages933-944
Number of pages12
ISBN (Electronic)9781509065431
DOIs
Publication statusPublished - 16 May 2017
Event33rd IEEE International Conference on Data Engineering, ICDE 2017 - San Diego, United States
Duration: 19 Apr 201722 Apr 2017

Other

Other33rd IEEE International Conference on Data Engineering, ICDE 2017
CountryUnited States
CitySan Diego
Period19/4/1722/4/17

    Fingerprint

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Cite this

Hao, S., Tang, N., Li, G., & Li, J. (2017). Cleaning relations using knowledge bases. In Proceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017 (pp. 933-944). [7930037] IEEE Computer Society. https://doi.org/10.1109/ICDE.2017.141