Guided data repair

Mohamed Yakout, Ahmed Elmagarmid, Jennifer Neville, Mourad Ouzzani, Ihab F. Ilyas

Research output: Chapter in Book/Report/Conference proceedingChapter

116 Citations (Scopus)

Abstract

In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair techniques while minimizing user involvement. GDR consults the user on the updates that are most likely to be benecial in improving data quality. GDR also uses machine learning methods to identify and apply the correct updates directly to the database without the actual involvement of the user on these specic updates. To rank potential updates for consultation by the user, we rst group these repairs and quantify the utility of each group using the decision-theory concept of value of information (VOI). We then apply active learning to order updates within a group based on their ability to improve the learned model. User feedback is used to repair the database and to adaptively rene the training set for the model. We empirically evaluate GDR on a real-world dataset and show signicant improvement in data quality using our user guided repairing process. We also, assess the trade-off between the user efforts and the resulting data quality.

Original languageEnglish
Title of host publicationProceedings of the VLDB Endowment
Pages279-289
Number of pages11
Volume4
Edition5
Publication statusPublished - Feb 2011

Fingerprint

Repair
Feedback
Decision theory
Learning systems
Cleaning

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Yakout, M., Elmagarmid, A., Neville, J., Ouzzani, M., & Ilyas, I. F. (2011). Guided data repair. In Proceedings of the VLDB Endowment (5 ed., Vol. 4, pp. 279-289)

Guided data repair. / Yakout, Mohamed; Elmagarmid, Ahmed; Neville, Jennifer; Ouzzani, Mourad; Ilyas, Ihab F.

Proceedings of the VLDB Endowment. Vol. 4 5. ed. 2011. p. 279-289.

Research output: Chapter in Book/Report/Conference proceedingChapter

Yakout, M, Elmagarmid, A, Neville, J, Ouzzani, M & Ilyas, IF 2011, Guided data repair. in Proceedings of the VLDB Endowment. 5 edn, vol. 4, pp. 279-289.
Yakout M, Elmagarmid A, Neville J, Ouzzani M, Ilyas IF. Guided data repair. In Proceedings of the VLDB Endowment. 5 ed. Vol. 4. 2011. p. 279-289
Yakout, Mohamed ; Elmagarmid, Ahmed ; Neville, Jennifer ; Ouzzani, Mourad ; Ilyas, Ihab F. / Guided data repair. Proceedings of the VLDB Endowment. Vol. 4 5. ed. 2011. pp. 279-289
@inbook{40721368c1214cf19b0948a65f1a53b7,
title = "Guided data repair",
abstract = "In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair techniques while minimizing user involvement. GDR consults the user on the updates that are most likely to be benecial in improving data quality. GDR also uses machine learning methods to identify and apply the correct updates directly to the database without the actual involvement of the user on these specic updates. To rank potential updates for consultation by the user, we rst group these repairs and quantify the utility of each group using the decision-theory concept of value of information (VOI). We then apply active learning to order updates within a group based on their ability to improve the learned model. User feedback is used to repair the database and to adaptively rene the training set for the model. We empirically evaluate GDR on a real-world dataset and show signicant improvement in data quality using our user guided repairing process. We also, assess the trade-off between the user efforts and the resulting data quality.",
author = "Mohamed Yakout and Ahmed Elmagarmid and Jennifer Neville and Mourad Ouzzani and Ilyas, {Ihab F.}",
year = "2011",
month = "2",
language = "English",
volume = "4",
pages = "279--289",
booktitle = "Proceedings of the VLDB Endowment",
edition = "5",

}

TY - CHAP

T1 - Guided data repair

AU - Yakout, Mohamed

AU - Elmagarmid, Ahmed

AU - Neville, Jennifer

AU - Ouzzani, Mourad

AU - Ilyas, Ihab F.

PY - 2011/2

Y1 - 2011/2

N2 - In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair techniques while minimizing user involvement. GDR consults the user on the updates that are most likely to be benecial in improving data quality. GDR also uses machine learning methods to identify and apply the correct updates directly to the database without the actual involvement of the user on these specic updates. To rank potential updates for consultation by the user, we rst group these repairs and quantify the utility of each group using the decision-theory concept of value of information (VOI). We then apply active learning to order updates within a group based on their ability to improve the learned model. User feedback is used to repair the database and to adaptively rene the training set for the model. We empirically evaluate GDR on a real-world dataset and show signicant improvement in data quality using our user guided repairing process. We also, assess the trade-off between the user efforts and the resulting data quality.

AB - In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair techniques while minimizing user involvement. GDR consults the user on the updates that are most likely to be benecial in improving data quality. GDR also uses machine learning methods to identify and apply the correct updates directly to the database without the actual involvement of the user on these specic updates. To rank potential updates for consultation by the user, we rst group these repairs and quantify the utility of each group using the decision-theory concept of value of information (VOI). We then apply active learning to order updates within a group based on their ability to improve the learned model. User feedback is used to repair the database and to adaptively rene the training set for the model. We empirically evaluate GDR on a real-world dataset and show signicant improvement in data quality using our user guided repairing process. We also, assess the trade-off between the user efforts and the resulting data quality.

UR - http://www.scopus.com/inward/record.url?scp=80052313567&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80052313567&partnerID=8YFLogxK

M3 - Chapter

AN - SCOPUS:80052313567

VL - 4

SP - 279

EP - 289

BT - Proceedings of the VLDB Endowment

ER -