Empirical privacy and empirical utility of anonymized data

Graham Cormode, Cecilia M. Procopiuc, Entong Shen, Divesh Srivastava, Ting Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

22 Citations (Scopus)

Abstract

Procedures to anonymize data sets are vital for companies, government agencies and other bodies to meet their obligations to share data without compromising the privacy of the individuals contributing to it. Despite much work on this topic, the area has not yet reached stability. Early models (k-anonymity and ℓ-diversity) are now thought to offer insufficient privacy. Noise-based methods like differential privacy are seen as providing stronger privacy, but less utility. However, across all methods sensitive information of some individuals can often be inferred with relatively high accuracy. In this paper, we reverse the idea of a 'privacy attack,' by incorporating it into a measure of privacy. Hence, we advocate the notion of empirical privacy, based on the posterior beliefs of an adversary, and their ability to draw inferences about sensitive values in the data. This is not a new model, but rather a unifying view: it allows us to study several well-known privacy models which are not directly comparable otherwise. We also consider an empirical approach to measuring utility, based on a workload of queries. Consequently, we are able to place different privacy models including differential privacy and early syntactic models on the same scale, and compare their privacy/utility tradeoff. We learn that, in practice, the difference between differential privacy and various syntactic models is less dramatic than previously thought, but there are still clear domination relations between them.

Original languageEnglish
Title of host publicationProceedings - International Conference on Data Engineering
Pages77-82
Number of pages6
DOIs
Publication statusPublished - 19 Aug 2013
Externally publishedYes
Event2013 IEEE 29th International Conference on Data Engineering Workshops, ICDEW 2013 - Brisbane, QLD, Australia
Duration: 8 Apr 201311 Apr 2013

Other

Other2013 IEEE 29th International Conference on Data Engineering Workshops, ICDEW 2013
CountryAustralia
CityBrisbane, QLD
Period8/4/1311/4/13

Fingerprint

Syntactics
Industry

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing
  • Software

Cite this

Cormode, G., Procopiuc, C. M., Shen, E., Srivastava, D., & Yu, T. (2013). Empirical privacy and empirical utility of anonymized data. In Proceedings - International Conference on Data Engineering (pp. 77-82). [6547431] https://doi.org/10.1109/ICDEW.2013.6547431

Empirical privacy and empirical utility of anonymized data. / Cormode, Graham; Procopiuc, Cecilia M.; Shen, Entong; Srivastava, Divesh; Yu, Ting.

Proceedings - International Conference on Data Engineering. 2013. p. 77-82 6547431.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cormode, G, Procopiuc, CM, Shen, E, Srivastava, D & Yu, T 2013, Empirical privacy and empirical utility of anonymized data. in Proceedings - International Conference on Data Engineering., 6547431, pp. 77-82, 2013 IEEE 29th International Conference on Data Engineering Workshops, ICDEW 2013, Brisbane, QLD, Australia, 8/4/13. https://doi.org/10.1109/ICDEW.2013.6547431
Cormode G, Procopiuc CM, Shen E, Srivastava D, Yu T. Empirical privacy and empirical utility of anonymized data. In Proceedings - International Conference on Data Engineering. 2013. p. 77-82. 6547431 https://doi.org/10.1109/ICDEW.2013.6547431
Cormode, Graham ; Procopiuc, Cecilia M. ; Shen, Entong ; Srivastava, Divesh ; Yu, Ting. / Empirical privacy and empirical utility of anonymized data. Proceedings - International Conference on Data Engineering. 2013. pp. 77-82
@inproceedings{10aeb95b72d84254a929629d2fab5e77,
title = "Empirical privacy and empirical utility of anonymized data",
abstract = "Procedures to anonymize data sets are vital for companies, government agencies and other bodies to meet their obligations to share data without compromising the privacy of the individuals contributing to it. Despite much work on this topic, the area has not yet reached stability. Early models (k-anonymity and ℓ-diversity) are now thought to offer insufficient privacy. Noise-based methods like differential privacy are seen as providing stronger privacy, but less utility. However, across all methods sensitive information of some individuals can often be inferred with relatively high accuracy. In this paper, we reverse the idea of a 'privacy attack,' by incorporating it into a measure of privacy. Hence, we advocate the notion of empirical privacy, based on the posterior beliefs of an adversary, and their ability to draw inferences about sensitive values in the data. This is not a new model, but rather a unifying view: it allows us to study several well-known privacy models which are not directly comparable otherwise. We also consider an empirical approach to measuring utility, based on a workload of queries. Consequently, we are able to place different privacy models including differential privacy and early syntactic models on the same scale, and compare their privacy/utility tradeoff. We learn that, in practice, the difference between differential privacy and various syntactic models is less dramatic than previously thought, but there are still clear domination relations between them.",
author = "Graham Cormode and Procopiuc, {Cecilia M.} and Entong Shen and Divesh Srivastava and Ting Yu",
year = "2013",
month = "8",
day = "19",
doi = "10.1109/ICDEW.2013.6547431",
language = "English",
isbn = "9781467353021",
pages = "77--82",
booktitle = "Proceedings - International Conference on Data Engineering",

}

TY - GEN

T1 - Empirical privacy and empirical utility of anonymized data

AU - Cormode, Graham

AU - Procopiuc, Cecilia M.

AU - Shen, Entong

AU - Srivastava, Divesh

AU - Yu, Ting

PY - 2013/8/19

Y1 - 2013/8/19

N2 - Procedures to anonymize data sets are vital for companies, government agencies and other bodies to meet their obligations to share data without compromising the privacy of the individuals contributing to it. Despite much work on this topic, the area has not yet reached stability. Early models (k-anonymity and ℓ-diversity) are now thought to offer insufficient privacy. Noise-based methods like differential privacy are seen as providing stronger privacy, but less utility. However, across all methods sensitive information of some individuals can often be inferred with relatively high accuracy. In this paper, we reverse the idea of a 'privacy attack,' by incorporating it into a measure of privacy. Hence, we advocate the notion of empirical privacy, based on the posterior beliefs of an adversary, and their ability to draw inferences about sensitive values in the data. This is not a new model, but rather a unifying view: it allows us to study several well-known privacy models which are not directly comparable otherwise. We also consider an empirical approach to measuring utility, based on a workload of queries. Consequently, we are able to place different privacy models including differential privacy and early syntactic models on the same scale, and compare their privacy/utility tradeoff. We learn that, in practice, the difference between differential privacy and various syntactic models is less dramatic than previously thought, but there are still clear domination relations between them.

AB - Procedures to anonymize data sets are vital for companies, government agencies and other bodies to meet their obligations to share data without compromising the privacy of the individuals contributing to it. Despite much work on this topic, the area has not yet reached stability. Early models (k-anonymity and ℓ-diversity) are now thought to offer insufficient privacy. Noise-based methods like differential privacy are seen as providing stronger privacy, but less utility. However, across all methods sensitive information of some individuals can often be inferred with relatively high accuracy. In this paper, we reverse the idea of a 'privacy attack,' by incorporating it into a measure of privacy. Hence, we advocate the notion of empirical privacy, based on the posterior beliefs of an adversary, and their ability to draw inferences about sensitive values in the data. This is not a new model, but rather a unifying view: it allows us to study several well-known privacy models which are not directly comparable otherwise. We also consider an empirical approach to measuring utility, based on a workload of queries. Consequently, we are able to place different privacy models including differential privacy and early syntactic models on the same scale, and compare their privacy/utility tradeoff. We learn that, in practice, the difference between differential privacy and various syntactic models is less dramatic than previously thought, but there are still clear domination relations between them.

UR - http://www.scopus.com/inward/record.url?scp=84881395512&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84881395512&partnerID=8YFLogxK

U2 - 10.1109/ICDEW.2013.6547431

DO - 10.1109/ICDEW.2013.6547431

M3 - Conference contribution

AN - SCOPUS:84881395512

SN - 9781467353021

SP - 77

EP - 82

BT - Proceedings - International Conference on Data Engineering

ER -