Abstract
Repairing erroneous or conicting data that violate a set of constraints is an important problem in data management. Many automatic or semi-automatic data-repairing algorithms have been proposed in the last few years, each with its own strengths and weaknesses. Bart is an open-source error-generation system conceived to support thorough experimental evaluations of these data-repairing systems. The demo is centered around three main lessons. To start, we discuss how generating errors in data is a complex problem, with several facets. We introduce the important notions of detectability and repairability of an error, that stand at the core of Bart. Then, we show how, by changing the features of errors, it is possible to inuence quite significantly the performance of the tools. Finally, we concretely put to work five data-repairing algorithms on dirty data of various kinds generated using Bart, and discuss their performance.
Original language | English |
---|---|
Title of host publication | SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data |
Publisher | Association for Computing Machinery |
Pages | 2161-2164 |
Number of pages | 4 |
Volume | 26-June-2016 |
ISBN (Electronic) | 9781450335317 |
DOIs | |
Publication status | Published - 26 Jun 2016 |
Externally published | Yes |
Event | 2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016 - San Francisco, United States Duration: 26 Jun 2016 → 1 Jul 2016 |
Other
Other | 2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016 |
---|---|
Country | United States |
City | San Francisco |
Period | 26/6/16 → 1/7/16 |
Fingerprint
ASJC Scopus subject areas
- Software
- Information Systems
Cite this
BART in action : Error generation and empirical evaluations of data-cleaning systems. / Santoro, Donatello; Arocena, Patricia C.; Glavic, Boris; Mecca, Giansalvatore; Miller, Renée J.; Papotti, Paolo.
SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data. Vol. 26-June-2016 Association for Computing Machinery, 2016. p. 2161-2164.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - BART in action
T2 - Error generation and empirical evaluations of data-cleaning systems
AU - Santoro, Donatello
AU - Arocena, Patricia C.
AU - Glavic, Boris
AU - Mecca, Giansalvatore
AU - Miller, Renée J.
AU - Papotti, Paolo
PY - 2016/6/26
Y1 - 2016/6/26
N2 - Repairing erroneous or conicting data that violate a set of constraints is an important problem in data management. Many automatic or semi-automatic data-repairing algorithms have been proposed in the last few years, each with its own strengths and weaknesses. Bart is an open-source error-generation system conceived to support thorough experimental evaluations of these data-repairing systems. The demo is centered around three main lessons. To start, we discuss how generating errors in data is a complex problem, with several facets. We introduce the important notions of detectability and repairability of an error, that stand at the core of Bart. Then, we show how, by changing the features of errors, it is possible to inuence quite significantly the performance of the tools. Finally, we concretely put to work five data-repairing algorithms on dirty data of various kinds generated using Bart, and discuss their performance.
AB - Repairing erroneous or conicting data that violate a set of constraints is an important problem in data management. Many automatic or semi-automatic data-repairing algorithms have been proposed in the last few years, each with its own strengths and weaknesses. Bart is an open-source error-generation system conceived to support thorough experimental evaluations of these data-repairing systems. The demo is centered around three main lessons. To start, we discuss how generating errors in data is a complex problem, with several facets. We introduce the important notions of detectability and repairability of an error, that stand at the core of Bart. Then, we show how, by changing the features of errors, it is possible to inuence quite significantly the performance of the tools. Finally, we concretely put to work five data-repairing algorithms on dirty data of various kinds generated using Bart, and discuss their performance.
UR - http://www.scopus.com/inward/record.url?scp=84979703582&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84979703582&partnerID=8YFLogxK
U2 - 10.1145/2882903.2899397
DO - 10.1145/2882903.2899397
M3 - Conference contribution
AN - SCOPUS:84979703582
VL - 26-June-2016
SP - 2161
EP - 2164
BT - SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data
PB - Association for Computing Machinery
ER -