Aggregate query answering on possibilistic data with cardinality constraints

Graham Cormode, Divesh Srivastava, Entong Shen, Ting Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

Uncertainties in data can arise for a number of reasons: when data is incomplete, contains conflicting information or has been deliberately perturbed or coarsened to remove sensitive details. An important case which arises in many real applications is when the data describes a set of possibilities, but with cardinality constraints. These constraints represent correlations between tuples encoding, e.g. that at most two possible records are correct, or that there is an (unknown) one-to-one mapping between a set of tuples and attribute values. Although there has been much effort to handle uncertain data, current systems are not equipped to handle such correlations, beyond simple mutual exclusion and co-existence constraints. Vitally, they have little support for efficiently handling aggregate queries on such data. In this paper, we aim to address some of these deficiencies, by introducing LICM (Linear Integer Constraint Model), which can succinctly represent many types of tuple correlations, particularly a class of cardinality constraints. We motivate and explain the model with examples from data cleaning and masking sensitive data, to show that it enables modeling and querying such data, which was not previously possible. We develop an efficient strategy to answer conjunctive and aggregate queries on possibilistic data by describing how to implement relational operators over data in the model. LICM compactly integrates the encoding of correlations, query answering and lineage recording. In combination with off-the-shelf linear integer programming solvers, our approach provides exact bounds for aggregate queries. Our prototype implementation demonstrates that query answering with LICM can be effective and scalable.

Original languageEnglish
Title of host publicationProceedings - International Conference on Data Engineering
Pages258-269
Number of pages12
DOIs
Publication statusPublished - 2012
Externally publishedYes
EventIEEE 28th International Conference on Data Engineering, ICDE 2012 - Arlington, VA, United States
Duration: 1 Apr 20125 Apr 2012

Other

OtherIEEE 28th International Conference on Data Engineering, ICDE 2012
CountryUnited States
CityArlington, VA
Period1/4/125/4/12

Fingerprint

Integer programming
Cleaning
Uncertainty

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing
  • Software

Cite this

Cormode, G., Srivastava, D., Shen, E., & Yu, T. (2012). Aggregate query answering on possibilistic data with cardinality constraints. In Proceedings - International Conference on Data Engineering (pp. 258-269). [6228089] https://doi.org/10.1109/ICDE.2012.15

Aggregate query answering on possibilistic data with cardinality constraints. / Cormode, Graham; Srivastava, Divesh; Shen, Entong; Yu, Ting.

Proceedings - International Conference on Data Engineering. 2012. p. 258-269 6228089.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cormode, G, Srivastava, D, Shen, E & Yu, T 2012, Aggregate query answering on possibilistic data with cardinality constraints. in Proceedings - International Conference on Data Engineering., 6228089, pp. 258-269, IEEE 28th International Conference on Data Engineering, ICDE 2012, Arlington, VA, United States, 1/4/12. https://doi.org/10.1109/ICDE.2012.15
Cormode G, Srivastava D, Shen E, Yu T. Aggregate query answering on possibilistic data with cardinality constraints. In Proceedings - International Conference on Data Engineering. 2012. p. 258-269. 6228089 https://doi.org/10.1109/ICDE.2012.15
Cormode, Graham ; Srivastava, Divesh ; Shen, Entong ; Yu, Ting. / Aggregate query answering on possibilistic data with cardinality constraints. Proceedings - International Conference on Data Engineering. 2012. pp. 258-269
@inproceedings{cbf613af51b54289bb25f815573a5e61,
title = "Aggregate query answering on possibilistic data with cardinality constraints",
abstract = "Uncertainties in data can arise for a number of reasons: when data is incomplete, contains conflicting information or has been deliberately perturbed or coarsened to remove sensitive details. An important case which arises in many real applications is when the data describes a set of possibilities, but with cardinality constraints. These constraints represent correlations between tuples encoding, e.g. that at most two possible records are correct, or that there is an (unknown) one-to-one mapping between a set of tuples and attribute values. Although there has been much effort to handle uncertain data, current systems are not equipped to handle such correlations, beyond simple mutual exclusion and co-existence constraints. Vitally, they have little support for efficiently handling aggregate queries on such data. In this paper, we aim to address some of these deficiencies, by introducing LICM (Linear Integer Constraint Model), which can succinctly represent many types of tuple correlations, particularly a class of cardinality constraints. We motivate and explain the model with examples from data cleaning and masking sensitive data, to show that it enables modeling and querying such data, which was not previously possible. We develop an efficient strategy to answer conjunctive and aggregate queries on possibilistic data by describing how to implement relational operators over data in the model. LICM compactly integrates the encoding of correlations, query answering and lineage recording. In combination with off-the-shelf linear integer programming solvers, our approach provides exact bounds for aggregate queries. Our prototype implementation demonstrates that query answering with LICM can be effective and scalable.",
author = "Graham Cormode and Divesh Srivastava and Entong Shen and Ting Yu",
year = "2012",
doi = "10.1109/ICDE.2012.15",
language = "English",
pages = "258--269",
booktitle = "Proceedings - International Conference on Data Engineering",

}

TY - GEN

T1 - Aggregate query answering on possibilistic data with cardinality constraints

AU - Cormode, Graham

AU - Srivastava, Divesh

AU - Shen, Entong

AU - Yu, Ting

PY - 2012

Y1 - 2012

N2 - Uncertainties in data can arise for a number of reasons: when data is incomplete, contains conflicting information or has been deliberately perturbed or coarsened to remove sensitive details. An important case which arises in many real applications is when the data describes a set of possibilities, but with cardinality constraints. These constraints represent correlations between tuples encoding, e.g. that at most two possible records are correct, or that there is an (unknown) one-to-one mapping between a set of tuples and attribute values. Although there has been much effort to handle uncertain data, current systems are not equipped to handle such correlations, beyond simple mutual exclusion and co-existence constraints. Vitally, they have little support for efficiently handling aggregate queries on such data. In this paper, we aim to address some of these deficiencies, by introducing LICM (Linear Integer Constraint Model), which can succinctly represent many types of tuple correlations, particularly a class of cardinality constraints. We motivate and explain the model with examples from data cleaning and masking sensitive data, to show that it enables modeling and querying such data, which was not previously possible. We develop an efficient strategy to answer conjunctive and aggregate queries on possibilistic data by describing how to implement relational operators over data in the model. LICM compactly integrates the encoding of correlations, query answering and lineage recording. In combination with off-the-shelf linear integer programming solvers, our approach provides exact bounds for aggregate queries. Our prototype implementation demonstrates that query answering with LICM can be effective and scalable.

AB - Uncertainties in data can arise for a number of reasons: when data is incomplete, contains conflicting information or has been deliberately perturbed or coarsened to remove sensitive details. An important case which arises in many real applications is when the data describes a set of possibilities, but with cardinality constraints. These constraints represent correlations between tuples encoding, e.g. that at most two possible records are correct, or that there is an (unknown) one-to-one mapping between a set of tuples and attribute values. Although there has been much effort to handle uncertain data, current systems are not equipped to handle such correlations, beyond simple mutual exclusion and co-existence constraints. Vitally, they have little support for efficiently handling aggregate queries on such data. In this paper, we aim to address some of these deficiencies, by introducing LICM (Linear Integer Constraint Model), which can succinctly represent many types of tuple correlations, particularly a class of cardinality constraints. We motivate and explain the model with examples from data cleaning and masking sensitive data, to show that it enables modeling and querying such data, which was not previously possible. We develop an efficient strategy to answer conjunctive and aggregate queries on possibilistic data by describing how to implement relational operators over data in the model. LICM compactly integrates the encoding of correlations, query answering and lineage recording. In combination with off-the-shelf linear integer programming solvers, our approach provides exact bounds for aggregate queries. Our prototype implementation demonstrates that query answering with LICM can be effective and scalable.

UR - http://www.scopus.com/inward/record.url?scp=84864239686&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84864239686&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2012.15

DO - 10.1109/ICDE.2012.15

M3 - Conference contribution

SP - 258

EP - 269

BT - Proceedings - International Conference on Data Engineering

ER -