Data quality problems beyond consistency and deduplication

Wenfei Fan, Floris Geerts, Shuai Ma, Nan Tang, Wenyuan Yu

Research output: Chapter in Book/Report/Conference proceedingChapter

9 Citations (Scopus)

Abstract

Recent work on data quality has primarily focused on data repairing algorithms for improving data consistency and record matching methods for data deduplication. This paper accentuates several other challenging issues that are essential to developing data cleaning systems, namely, error correction with performance guarantees, unification of data repairing and record matching, relative information completeness, and data currency. We provide an overview of recent advances in the study of these issues, and advocate the need for developing a logical framework for a uniform treatment of these issues.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages237-249
Number of pages13
Volume8000
DOIs
Publication statusPublished - 1 Dec 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8000
ISSN (Print)03029743
ISSN (Electronic)16113349

Fingerprint

Data Quality
Error correction
Cleaning
Data Consistency
Performance Guarantee
Currency
Error Correction
Unification
Completeness

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Fan, W., Geerts, F., Ma, S., Tang, N., & Yu, W. (2013). Data quality problems beyond consistency and deduplication. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8000, pp. 237-249). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8000). https://doi.org/10.1007/978-3-642-41660-6-12

Data quality problems beyond consistency and deduplication. / Fan, Wenfei; Geerts, Floris; Ma, Shuai; Tang, Nan; Yu, Wenyuan.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8000 2013. p. 237-249 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8000).

Research output: Chapter in Book/Report/Conference proceedingChapter

Fan, W, Geerts, F, Ma, S, Tang, N & Yu, W 2013, Data quality problems beyond consistency and deduplication. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 8000, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8000, pp. 237-249. https://doi.org/10.1007/978-3-642-41660-6-12
Fan W, Geerts F, Ma S, Tang N, Yu W. Data quality problems beyond consistency and deduplication. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8000. 2013. p. 237-249. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-41660-6-12
Fan, Wenfei ; Geerts, Floris ; Ma, Shuai ; Tang, Nan ; Yu, Wenyuan. / Data quality problems beyond consistency and deduplication. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8000 2013. pp. 237-249 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inbook{6e1ffb2494734fe5b08ca1ffb6c3a899,
title = "Data quality problems beyond consistency and deduplication",
abstract = "Recent work on data quality has primarily focused on data repairing algorithms for improving data consistency and record matching methods for data deduplication. This paper accentuates several other challenging issues that are essential to developing data cleaning systems, namely, error correction with performance guarantees, unification of data repairing and record matching, relative information completeness, and data currency. We provide an overview of recent advances in the study of these issues, and advocate the need for developing a logical framework for a uniform treatment of these issues.",
author = "Wenfei Fan and Floris Geerts and Shuai Ma and Nan Tang and Wenyuan Yu",
year = "2013",
month = "12",
day = "1",
doi = "10.1007/978-3-642-41660-6-12",
language = "English",
isbn = "9783642416590",
volume = "8000",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "237--249",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - CHAP

T1 - Data quality problems beyond consistency and deduplication

AU - Fan, Wenfei

AU - Geerts, Floris

AU - Ma, Shuai

AU - Tang, Nan

AU - Yu, Wenyuan

PY - 2013/12/1

Y1 - 2013/12/1

N2 - Recent work on data quality has primarily focused on data repairing algorithms for improving data consistency and record matching methods for data deduplication. This paper accentuates several other challenging issues that are essential to developing data cleaning systems, namely, error correction with performance guarantees, unification of data repairing and record matching, relative information completeness, and data currency. We provide an overview of recent advances in the study of these issues, and advocate the need for developing a logical framework for a uniform treatment of these issues.

AB - Recent work on data quality has primarily focused on data repairing algorithms for improving data consistency and record matching methods for data deduplication. This paper accentuates several other challenging issues that are essential to developing data cleaning systems, namely, error correction with performance guarantees, unification of data repairing and record matching, relative information completeness, and data currency. We provide an overview of recent advances in the study of these issues, and advocate the need for developing a logical framework for a uniform treatment of these issues.

UR - http://www.scopus.com/inward/record.url?scp=84892779233&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84892779233&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-41660-6-12

DO - 10.1007/978-3-642-41660-6-12

M3 - Chapter

SN - 9783642416590

VL - 8000

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 237

EP - 249

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -