BART in action: Error generation and empirical evaluations of data-cleaning systems

Donatello Santoro, Patricia C. Arocena, Boris Glavic, Giansalvatore Mecca, Renée J. Miller, Paolo Papotti

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Repairing erroneous or conicting data that violate a set of constraints is an important problem in data management. Many automatic or semi-automatic data-repairing algorithms have been proposed in the last few years, each with its own strengths and weaknesses. Bart is an open-source error-generation system conceived to support thorough experimental evaluations of these data-repairing systems. The demo is centered around three main lessons. To start, we discuss how generating errors in data is a complex problem, with several facets. We introduce the important notions of detectability and repairability of an error, that stand at the core of Bart. Then, we show how, by changing the features of errors, it is possible to inuence quite significantly the performance of the tools. Finally, we concretely put to work five data-repairing algorithms on dirty data of various kinds generated using Bart, and discuss their performance.

Original languageEnglish
Title of host publicationSIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages2161-2164
Number of pages4
Volume26-June-2016
ISBN (Electronic)9781450335317
DOIs
Publication statusPublished - 26 Jun 2016
Externally publishedYes
Event2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016 - San Francisco, United States
Duration: 26 Jun 20161 Jul 2016

Other

Other2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016
CountryUnited States
CitySan Francisco
Period26/6/161/7/16

Fingerprint

Cleaning
Information management

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Santoro, D., Arocena, P. C., Glavic, B., Mecca, G., Miller, R. J., & Papotti, P. (2016). BART in action: Error generation and empirical evaluations of data-cleaning systems. In SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data (Vol. 26-June-2016, pp. 2161-2164). Association for Computing Machinery. https://doi.org/10.1145/2882903.2899397

BART in action : Error generation and empirical evaluations of data-cleaning systems. / Santoro, Donatello; Arocena, Patricia C.; Glavic, Boris; Mecca, Giansalvatore; Miller, Renée J.; Papotti, Paolo.

SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data. Vol. 26-June-2016 Association for Computing Machinery, 2016. p. 2161-2164.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Santoro, D, Arocena, PC, Glavic, B, Mecca, G, Miller, RJ & Papotti, P 2016, BART in action: Error generation and empirical evaluations of data-cleaning systems. in SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data. vol. 26-June-2016, Association for Computing Machinery, pp. 2161-2164, 2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016, San Francisco, United States, 26/6/16. https://doi.org/10.1145/2882903.2899397
Santoro D, Arocena PC, Glavic B, Mecca G, Miller RJ, Papotti P. BART in action: Error generation and empirical evaluations of data-cleaning systems. In SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data. Vol. 26-June-2016. Association for Computing Machinery. 2016. p. 2161-2164 https://doi.org/10.1145/2882903.2899397
Santoro, Donatello ; Arocena, Patricia C. ; Glavic, Boris ; Mecca, Giansalvatore ; Miller, Renée J. ; Papotti, Paolo. / BART in action : Error generation and empirical evaluations of data-cleaning systems. SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data. Vol. 26-June-2016 Association for Computing Machinery, 2016. pp. 2161-2164
@inproceedings{501382662de2485193e7a88a883ea73e,
title = "BART in action: Error generation and empirical evaluations of data-cleaning systems",
abstract = "Repairing erroneous or conicting data that violate a set of constraints is an important problem in data management. Many automatic or semi-automatic data-repairing algorithms have been proposed in the last few years, each with its own strengths and weaknesses. Bart is an open-source error-generation system conceived to support thorough experimental evaluations of these data-repairing systems. The demo is centered around three main lessons. To start, we discuss how generating errors in data is a complex problem, with several facets. We introduce the important notions of detectability and repairability of an error, that stand at the core of Bart. Then, we show how, by changing the features of errors, it is possible to inuence quite significantly the performance of the tools. Finally, we concretely put to work five data-repairing algorithms on dirty data of various kinds generated using Bart, and discuss their performance.",
author = "Donatello Santoro and Arocena, {Patricia C.} and Boris Glavic and Giansalvatore Mecca and Miller, {Ren{\'e}e J.} and Paolo Papotti",
year = "2016",
month = "6",
day = "26",
doi = "10.1145/2882903.2899397",
language = "English",
volume = "26-June-2016",
pages = "2161--2164",
booktitle = "SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - BART in action

T2 - Error generation and empirical evaluations of data-cleaning systems

AU - Santoro, Donatello

AU - Arocena, Patricia C.

AU - Glavic, Boris

AU - Mecca, Giansalvatore

AU - Miller, Renée J.

AU - Papotti, Paolo

PY - 2016/6/26

Y1 - 2016/6/26

N2 - Repairing erroneous or conicting data that violate a set of constraints is an important problem in data management. Many automatic or semi-automatic data-repairing algorithms have been proposed in the last few years, each with its own strengths and weaknesses. Bart is an open-source error-generation system conceived to support thorough experimental evaluations of these data-repairing systems. The demo is centered around three main lessons. To start, we discuss how generating errors in data is a complex problem, with several facets. We introduce the important notions of detectability and repairability of an error, that stand at the core of Bart. Then, we show how, by changing the features of errors, it is possible to inuence quite significantly the performance of the tools. Finally, we concretely put to work five data-repairing algorithms on dirty data of various kinds generated using Bart, and discuss their performance.

AB - Repairing erroneous or conicting data that violate a set of constraints is an important problem in data management. Many automatic or semi-automatic data-repairing algorithms have been proposed in the last few years, each with its own strengths and weaknesses. Bart is an open-source error-generation system conceived to support thorough experimental evaluations of these data-repairing systems. The demo is centered around three main lessons. To start, we discuss how generating errors in data is a complex problem, with several facets. We introduce the important notions of detectability and repairability of an error, that stand at the core of Bart. Then, we show how, by changing the features of errors, it is possible to inuence quite significantly the performance of the tools. Finally, we concretely put to work five data-repairing algorithms on dirty data of various kinds generated using Bart, and discuss their performance.

UR - http://www.scopus.com/inward/record.url?scp=84979703582&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84979703582&partnerID=8YFLogxK

U2 - 10.1145/2882903.2899397

DO - 10.1145/2882903.2899397

M3 - Conference contribution

AN - SCOPUS:84979703582

VL - 26-June-2016

SP - 2161

EP - 2164

BT - SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data

PB - Association for Computing Machinery

ER -