NADEEF: A generalized data cleaning system

Research output: Chapter in Book/Report/Conference proceedingChapter

22 Citations (Scopus)

Abstract

We present NADEEF, an extensible, generic and easy-to-deploy data cleaning system. NADEEF distinguishes between a programming interface and a core to achieve generality and extensibility. The programming interface allows users to specify data quality rules by writing code that implements predefined classes. These classes uniformly define what is wrong with the data and (possibly) how to fix it. We will demonstrate the following features provided by NADEEF. (1) Heterogeneity: The programming interface can be used to express many types of data quality rules beyond the well known CFDs (FDs), MDs and ETL rules. (2) Interdependency: The core algorithms can interleave multiple types of rules to detect and repair data errors. (3) Deployment and extensibility: Users can easily customize NADEEF by defining new types of rules, or by extending the core. (4) Metadata management and data custodians: We show a live data quality dashboard to effectively involve users in the data cleaning process.

Original languageEnglish
Title of host publicationProceedings of the VLDB Endowment
Pages1218-1221
Number of pages4
Volume6
Edition12
Publication statusPublished - Aug 2013

Fingerprint

Cleaning
Metadata
User interfaces
Computational fluid dynamics
Repair

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Ebaid, A., Elmagarmid, A., Ilyas, I. F., Ouzzani, M., Quiane Ruiz, J. A., Tang, N., & Yin, S. (2013). NADEEF: A generalized data cleaning system. In Proceedings of the VLDB Endowment (12 ed., Vol. 6, pp. 1218-1221)

NADEEF : A generalized data cleaning system. / Ebaid, Amr; Elmagarmid, Ahmed; Ilyas, Ihab F.; Ouzzani, Mourad; Quiane Ruiz, Jorge Arnulfo; Tang, Nan; Yin, Si.

Proceedings of the VLDB Endowment. Vol. 6 12. ed. 2013. p. 1218-1221.

Research output: Chapter in Book/Report/Conference proceedingChapter

Ebaid, A, Elmagarmid, A, Ilyas, IF, Ouzzani, M, Quiane Ruiz, JA, Tang, N & Yin, S 2013, NADEEF: A generalized data cleaning system. in Proceedings of the VLDB Endowment. 12 edn, vol. 6, pp. 1218-1221.
Ebaid A, Elmagarmid A, Ilyas IF, Ouzzani M, Quiane Ruiz JA, Tang N et al. NADEEF: A generalized data cleaning system. In Proceedings of the VLDB Endowment. 12 ed. Vol. 6. 2013. p. 1218-1221
Ebaid, Amr ; Elmagarmid, Ahmed ; Ilyas, Ihab F. ; Ouzzani, Mourad ; Quiane Ruiz, Jorge Arnulfo ; Tang, Nan ; Yin, Si. / NADEEF : A generalized data cleaning system. Proceedings of the VLDB Endowment. Vol. 6 12. ed. 2013. pp. 1218-1221
@inbook{398a0c1d345c43c5a309d6ae90213f36,
title = "NADEEF: A generalized data cleaning system",
abstract = "We present NADEEF, an extensible, generic and easy-to-deploy data cleaning system. NADEEF distinguishes between a programming interface and a core to achieve generality and extensibility. The programming interface allows users to specify data quality rules by writing code that implements predefined classes. These classes uniformly define what is wrong with the data and (possibly) how to fix it. We will demonstrate the following features provided by NADEEF. (1) Heterogeneity: The programming interface can be used to express many types of data quality rules beyond the well known CFDs (FDs), MDs and ETL rules. (2) Interdependency: The core algorithms can interleave multiple types of rules to detect and repair data errors. (3) Deployment and extensibility: Users can easily customize NADEEF by defining new types of rules, or by extending the core. (4) Metadata management and data custodians: We show a live data quality dashboard to effectively involve users in the data cleaning process.",
author = "Amr Ebaid and Ahmed Elmagarmid and Ilyas, {Ihab F.} and Mourad Ouzzani and {Quiane Ruiz}, {Jorge Arnulfo} and Nan Tang and Si Yin",
year = "2013",
month = "8",
language = "English",
volume = "6",
pages = "1218--1221",
booktitle = "Proceedings of the VLDB Endowment",
edition = "12",

}

TY - CHAP

T1 - NADEEF

T2 - A generalized data cleaning system

AU - Ebaid, Amr

AU - Elmagarmid, Ahmed

AU - Ilyas, Ihab F.

AU - Ouzzani, Mourad

AU - Quiane Ruiz, Jorge Arnulfo

AU - Tang, Nan

AU - Yin, Si

PY - 2013/8

Y1 - 2013/8

N2 - We present NADEEF, an extensible, generic and easy-to-deploy data cleaning system. NADEEF distinguishes between a programming interface and a core to achieve generality and extensibility. The programming interface allows users to specify data quality rules by writing code that implements predefined classes. These classes uniformly define what is wrong with the data and (possibly) how to fix it. We will demonstrate the following features provided by NADEEF. (1) Heterogeneity: The programming interface can be used to express many types of data quality rules beyond the well known CFDs (FDs), MDs and ETL rules. (2) Interdependency: The core algorithms can interleave multiple types of rules to detect and repair data errors. (3) Deployment and extensibility: Users can easily customize NADEEF by defining new types of rules, or by extending the core. (4) Metadata management and data custodians: We show a live data quality dashboard to effectively involve users in the data cleaning process.

AB - We present NADEEF, an extensible, generic and easy-to-deploy data cleaning system. NADEEF distinguishes between a programming interface and a core to achieve generality and extensibility. The programming interface allows users to specify data quality rules by writing code that implements predefined classes. These classes uniformly define what is wrong with the data and (possibly) how to fix it. We will demonstrate the following features provided by NADEEF. (1) Heterogeneity: The programming interface can be used to express many types of data quality rules beyond the well known CFDs (FDs), MDs and ETL rules. (2) Interdependency: The core algorithms can interleave multiple types of rules to detect and repair data errors. (3) Deployment and extensibility: Users can easily customize NADEEF by defining new types of rules, or by extending the core. (4) Metadata management and data custodians: We show a live data quality dashboard to effectively involve users in the data cleaning process.

UR - http://www.scopus.com/inward/record.url?scp=84891121005&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84891121005&partnerID=8YFLogxK

M3 - Chapter

AN - SCOPUS:84891121005

VL - 6

SP - 1218

EP - 1221

BT - Proceedings of the VLDB Endowment

ER -