Mapping and cleaning

Floris Geerts, Giansalvatore Mecca, Paolo Papotti, Donatello Santoro

Research output: Chapter in Book/Report/Conference proceedingConference contribution

30 Citations (Scopus)

Abstract

We address the challenging and open problem of bringing together two crucial activities in data integration and data quality, i.e., transforming data using schema mappings, and fixing conflicts and inconsistencies using data repairing. This problem is made complex by several factors. First, schema mappings and data repairing have traditionally been considered as separate activities, and research has progressed in a largely independent way in the two fields. Second, the elegant formalizations and the algorithms that have been proposed for both tasks have had mixed fortune in scaling to large databases. In the paper, we introduce a very general notion of a mapping and cleaning scenario that incorporates a wide variety of features, like, for example, user interventions. We develop a new semantics for these scenarios that represents a conservative extension of previous semantics for schema mappings and data repairing. Based on the semantics, we introduce a chase-based algorithm to compute solutions. Appropriate care is devoted to developing a scalable implementation of the chase algorithm. To the best of our knowledge, this is the first general and scalable proposal in this direction.

Original languageEnglish
Title of host publicationProceedings - International Conference on Data Engineering
PublisherIEEE Computer Society
Pages232-243
Number of pages12
ISBN (Print)9781479925544
DOIs
Publication statusPublished - 1 Jan 2014
Event30th IEEE International Conference on Data Engineering, ICDE 2014 - Chicago, IL, United States
Duration: 31 Mar 20144 Apr 2014

Other

Other30th IEEE International Conference on Data Engineering, ICDE 2014
CountryUnited States
CityChicago, IL
Period31/3/144/4/14

Fingerprint

Cleaning
Semantics
Data integration

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing
  • Software

Cite this

Geerts, F., Mecca, G., Papotti, P., & Santoro, D. (2014). Mapping and cleaning. In Proceedings - International Conference on Data Engineering (pp. 232-243). [6816654] IEEE Computer Society. https://doi.org/10.1109/ICDE.2014.6816654

Mapping and cleaning. / Geerts, Floris; Mecca, Giansalvatore; Papotti, Paolo; Santoro, Donatello.

Proceedings - International Conference on Data Engineering. IEEE Computer Society, 2014. p. 232-243 6816654.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Geerts, F, Mecca, G, Papotti, P & Santoro, D 2014, Mapping and cleaning. in Proceedings - International Conference on Data Engineering., 6816654, IEEE Computer Society, pp. 232-243, 30th IEEE International Conference on Data Engineering, ICDE 2014, Chicago, IL, United States, 31/3/14. https://doi.org/10.1109/ICDE.2014.6816654
Geerts F, Mecca G, Papotti P, Santoro D. Mapping and cleaning. In Proceedings - International Conference on Data Engineering. IEEE Computer Society. 2014. p. 232-243. 6816654 https://doi.org/10.1109/ICDE.2014.6816654
Geerts, Floris ; Mecca, Giansalvatore ; Papotti, Paolo ; Santoro, Donatello. / Mapping and cleaning. Proceedings - International Conference on Data Engineering. IEEE Computer Society, 2014. pp. 232-243
@inproceedings{6a968c3434ea4eadbb00d03850ee5928,
title = "Mapping and cleaning",
abstract = "We address the challenging and open problem of bringing together two crucial activities in data integration and data quality, i.e., transforming data using schema mappings, and fixing conflicts and inconsistencies using data repairing. This problem is made complex by several factors. First, schema mappings and data repairing have traditionally been considered as separate activities, and research has progressed in a largely independent way in the two fields. Second, the elegant formalizations and the algorithms that have been proposed for both tasks have had mixed fortune in scaling to large databases. In the paper, we introduce a very general notion of a mapping and cleaning scenario that incorporates a wide variety of features, like, for example, user interventions. We develop a new semantics for these scenarios that represents a conservative extension of previous semantics for schema mappings and data repairing. Based on the semantics, we introduce a chase-based algorithm to compute solutions. Appropriate care is devoted to developing a scalable implementation of the chase algorithm. To the best of our knowledge, this is the first general and scalable proposal in this direction.",
author = "Floris Geerts and Giansalvatore Mecca and Paolo Papotti and Donatello Santoro",
year = "2014",
month = "1",
day = "1",
doi = "10.1109/ICDE.2014.6816654",
language = "English",
isbn = "9781479925544",
pages = "232--243",
booktitle = "Proceedings - International Conference on Data Engineering",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Mapping and cleaning

AU - Geerts, Floris

AU - Mecca, Giansalvatore

AU - Papotti, Paolo

AU - Santoro, Donatello

PY - 2014/1/1

Y1 - 2014/1/1

N2 - We address the challenging and open problem of bringing together two crucial activities in data integration and data quality, i.e., transforming data using schema mappings, and fixing conflicts and inconsistencies using data repairing. This problem is made complex by several factors. First, schema mappings and data repairing have traditionally been considered as separate activities, and research has progressed in a largely independent way in the two fields. Second, the elegant formalizations and the algorithms that have been proposed for both tasks have had mixed fortune in scaling to large databases. In the paper, we introduce a very general notion of a mapping and cleaning scenario that incorporates a wide variety of features, like, for example, user interventions. We develop a new semantics for these scenarios that represents a conservative extension of previous semantics for schema mappings and data repairing. Based on the semantics, we introduce a chase-based algorithm to compute solutions. Appropriate care is devoted to developing a scalable implementation of the chase algorithm. To the best of our knowledge, this is the first general and scalable proposal in this direction.

AB - We address the challenging and open problem of bringing together two crucial activities in data integration and data quality, i.e., transforming data using schema mappings, and fixing conflicts and inconsistencies using data repairing. This problem is made complex by several factors. First, schema mappings and data repairing have traditionally been considered as separate activities, and research has progressed in a largely independent way in the two fields. Second, the elegant formalizations and the algorithms that have been proposed for both tasks have had mixed fortune in scaling to large databases. In the paper, we introduce a very general notion of a mapping and cleaning scenario that incorporates a wide variety of features, like, for example, user interventions. We develop a new semantics for these scenarios that represents a conservative extension of previous semantics for schema mappings and data repairing. Based on the semantics, we introduce a chase-based algorithm to compute solutions. Appropriate care is devoted to developing a scalable implementation of the chase algorithm. To the best of our knowledge, this is the first general and scalable proposal in this direction.

UR - http://www.scopus.com/inward/record.url?scp=84901745035&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84901745035&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2014.6816654

DO - 10.1109/ICDE.2014.6816654

M3 - Conference contribution

AN - SCOPUS:84901745035

SN - 9781479925544

SP - 232

EP - 243

BT - Proceedings - International Conference on Data Engineering

PB - IEEE Computer Society

ER -