GDR

A system for guided data repair

Mohamed Yakout, Ahmed Elmagarmid, Jennifer Neville, Mourad Ouzzani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Citations (Scopus)

Abstract

Improving data quality is a time-consuming, labor-intensive and often domain specific operation. Existing data repair approaches are either fully automated or not efficient in interactively involving the users. We present a demo of GDR, a Guided Data Repair system that uses a novel approach to efficiently involve the user alongside automatic data repair techniques to reach better data quality as quickly as possible. Specifically, GDR generates data repairs and acquire feedback on them that would be most beneficial in improving the data quality. GDR quantifies the data quality benefit of generated repairs by combining mechanisms from decision theory and active learning. Based on these benefit scores, groups of repairs are ranked and displayed to the user. User feedback is used to train a machine learning component to eventually replace the user in deciding on the validity of a suggested repair. We describe how the generated repairs are ranked and displayed to the user in a "useful- looking" way and demonstrate how data quality can be effectively improved with minimal feedback from the user.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGMOD International Conference on Management of Data
Pages1223-1225
Number of pages3
DOIs
Publication statusPublished - 23 Jul 2010
Externally publishedYes
Event2010 International Conference on Management of Data, SIGMOD '10 - Indianapolis, IN, United States
Duration: 6 Jun 201011 Jun 2010

Other

Other2010 International Conference on Management of Data, SIGMOD '10
CountryUnited States
CityIndianapolis, IN
Period6/6/1011/6/10

Fingerprint

Repair
Feedback
Decision theory
Learning systems
Personnel

Keywords

  • data cleaning
  • data quality
  • data repair
  • interactive system

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Yakout, M., Elmagarmid, A., Neville, J., & Ouzzani, M. (2010). GDR: A system for guided data repair. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 1223-1225) https://doi.org/10.1145/1807167.1807325

GDR : A system for guided data repair. / Yakout, Mohamed; Elmagarmid, Ahmed; Neville, Jennifer; Ouzzani, Mourad.

Proceedings of the ACM SIGMOD International Conference on Management of Data. 2010. p. 1223-1225.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yakout, M, Elmagarmid, A, Neville, J & Ouzzani, M 2010, GDR: A system for guided data repair. in Proceedings of the ACM SIGMOD International Conference on Management of Data. pp. 1223-1225, 2010 International Conference on Management of Data, SIGMOD '10, Indianapolis, IN, United States, 6/6/10. https://doi.org/10.1145/1807167.1807325
Yakout M, Elmagarmid A, Neville J, Ouzzani M. GDR: A system for guided data repair. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 2010. p. 1223-1225 https://doi.org/10.1145/1807167.1807325
Yakout, Mohamed ; Elmagarmid, Ahmed ; Neville, Jennifer ; Ouzzani, Mourad. / GDR : A system for guided data repair. Proceedings of the ACM SIGMOD International Conference on Management of Data. 2010. pp. 1223-1225
@inproceedings{e7dc1df8a73c43f19e613b955d7f28b1,
title = "GDR: A system for guided data repair",
abstract = "Improving data quality is a time-consuming, labor-intensive and often domain specific operation. Existing data repair approaches are either fully automated or not efficient in interactively involving the users. We present a demo of GDR, a Guided Data Repair system that uses a novel approach to efficiently involve the user alongside automatic data repair techniques to reach better data quality as quickly as possible. Specifically, GDR generates data repairs and acquire feedback on them that would be most beneficial in improving the data quality. GDR quantifies the data quality benefit of generated repairs by combining mechanisms from decision theory and active learning. Based on these benefit scores, groups of repairs are ranked and displayed to the user. User feedback is used to train a machine learning component to eventually replace the user in deciding on the validity of a suggested repair. We describe how the generated repairs are ranked and displayed to the user in a {"}useful- looking{"} way and demonstrate how data quality can be effectively improved with minimal feedback from the user.",
keywords = "data cleaning, data quality, data repair, interactive system",
author = "Mohamed Yakout and Ahmed Elmagarmid and Jennifer Neville and Mourad Ouzzani",
year = "2010",
month = "7",
day = "23",
doi = "10.1145/1807167.1807325",
language = "English",
isbn = "9781450300322",
pages = "1223--1225",
booktitle = "Proceedings of the ACM SIGMOD International Conference on Management of Data",

}

TY - GEN

T1 - GDR

T2 - A system for guided data repair

AU - Yakout, Mohamed

AU - Elmagarmid, Ahmed

AU - Neville, Jennifer

AU - Ouzzani, Mourad

PY - 2010/7/23

Y1 - 2010/7/23

N2 - Improving data quality is a time-consuming, labor-intensive and often domain specific operation. Existing data repair approaches are either fully automated or not efficient in interactively involving the users. We present a demo of GDR, a Guided Data Repair system that uses a novel approach to efficiently involve the user alongside automatic data repair techniques to reach better data quality as quickly as possible. Specifically, GDR generates data repairs and acquire feedback on them that would be most beneficial in improving the data quality. GDR quantifies the data quality benefit of generated repairs by combining mechanisms from decision theory and active learning. Based on these benefit scores, groups of repairs are ranked and displayed to the user. User feedback is used to train a machine learning component to eventually replace the user in deciding on the validity of a suggested repair. We describe how the generated repairs are ranked and displayed to the user in a "useful- looking" way and demonstrate how data quality can be effectively improved with minimal feedback from the user.

AB - Improving data quality is a time-consuming, labor-intensive and often domain specific operation. Existing data repair approaches are either fully automated or not efficient in interactively involving the users. We present a demo of GDR, a Guided Data Repair system that uses a novel approach to efficiently involve the user alongside automatic data repair techniques to reach better data quality as quickly as possible. Specifically, GDR generates data repairs and acquire feedback on them that would be most beneficial in improving the data quality. GDR quantifies the data quality benefit of generated repairs by combining mechanisms from decision theory and active learning. Based on these benefit scores, groups of repairs are ranked and displayed to the user. User feedback is used to train a machine learning component to eventually replace the user in deciding on the validity of a suggested repair. We describe how the generated repairs are ranked and displayed to the user in a "useful- looking" way and demonstrate how data quality can be effectively improved with minimal feedback from the user.

KW - data cleaning

KW - data quality

KW - data repair

KW - interactive system

UR - http://www.scopus.com/inward/record.url?scp=77954715606&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77954715606&partnerID=8YFLogxK

U2 - 10.1145/1807167.1807325

DO - 10.1145/1807167.1807325

M3 - Conference contribution

SN - 9781450300322

SP - 1223

EP - 1225

BT - Proceedings of the ACM SIGMOD International Conference on Management of Data

ER -