UMicS

From anonymized data to usable MicroData

Graham Cormode, Entong Shen, Xi Gong, Ting Yu, Cecilia M. Procopiuc, Divesh Srivastava

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

There is currently a tug-of-war going on surrounding data releases. On one side, there are many strong reasons pulling to release data to other parties: business factors, freedom of information rules, and scientific sharing agreements. On the other side, concerns about individual privacy pull back, and seek to limit releases. Privacy technologies such as differential privacy have been proposed to resolve this deadlock, and there has been much study of how to perform private data release of data in various forms. The focus of such works has been largely on the data owner: what process should they apply to ensure that the released data preserves privacy whilst still capturing the input data distribution accurately. Almost no attention has been paid to the needs of the data user, who wants to make use of the released data within their existing suite of tools and data. The difficulty of making use of data releases is a major stumbling block for the widespread adoption of data privacy technologies. In this paper, instead of proposing new privacy mechanisms for data publishing, we consider the whole data release process, from the data owner to the data user. We lay out a set of principles for privacy tool design that highlights the requirements for interoperability, extensibility and scalability. We put these into practice with UMicS, an end-to-end prototype system to control the release and use of private data. An overarching tenet is that it should be possible to integrate the released data into the data user's systems with the minimum of change and cost. We describe how to instantiate UMicS in a variety of usage scenarios. We show how using data modeling techniques from machine learning can improve the utility, in particular when combined with background knowledge that the data user may possess. We implement UMicS, and evaluate it over a selection of data sets and release cases. We see that UMicS allows for very effective use of released data, while upholding our privacy principles.

Original languageEnglish
Title of host publicationInternational Conference on Information and Knowledge Management, Proceedings
Pages2255-2260
Number of pages6
DOIs
Publication statusPublished - 11 Dec 2013
Externally publishedYes
Event22nd ACM International Conference on Information and Knowledge Management, CIKM 2013 - San Francisco, CA, United States
Duration: 27 Oct 20131 Nov 2013

Other

Other22nd ACM International Conference on Information and Knowledge Management, CIKM 2013
CountryUnited States
CitySan Francisco, CA
Period27/10/131/11/13

Fingerprint

Micro data
Privacy
Owners
Costs
Interoperability
Scenarios
Layout
Data modeling
Pull
Prototype
Machine learning
Factors
Scalability
Freedom of information
Deadlock

Keywords

  • Data release
  • Differential privacy

ASJC Scopus subject areas

  • Business, Management and Accounting(all)
  • Decision Sciences(all)

Cite this

Cormode, G., Shen, E., Gong, X., Yu, T., Procopiuc, C. M., & Srivastava, D. (2013). UMicS: From anonymized data to usable MicroData. In International Conference on Information and Knowledge Management, Proceedings (pp. 2255-2260) https://doi.org/10.1145/2505515.2505737

UMicS : From anonymized data to usable MicroData. / Cormode, Graham; Shen, Entong; Gong, Xi; Yu, Ting; Procopiuc, Cecilia M.; Srivastava, Divesh.

International Conference on Information and Knowledge Management, Proceedings. 2013. p. 2255-2260.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cormode, G, Shen, E, Gong, X, Yu, T, Procopiuc, CM & Srivastava, D 2013, UMicS: From anonymized data to usable MicroData. in International Conference on Information and Knowledge Management, Proceedings. pp. 2255-2260, 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, San Francisco, CA, United States, 27/10/13. https://doi.org/10.1145/2505515.2505737
Cormode G, Shen E, Gong X, Yu T, Procopiuc CM, Srivastava D. UMicS: From anonymized data to usable MicroData. In International Conference on Information and Knowledge Management, Proceedings. 2013. p. 2255-2260 https://doi.org/10.1145/2505515.2505737
Cormode, Graham ; Shen, Entong ; Gong, Xi ; Yu, Ting ; Procopiuc, Cecilia M. ; Srivastava, Divesh. / UMicS : From anonymized data to usable MicroData. International Conference on Information and Knowledge Management, Proceedings. 2013. pp. 2255-2260
@inproceedings{650a7f127034405f81796a9599955d65,
title = "UMicS: From anonymized data to usable MicroData",
abstract = "There is currently a tug-of-war going on surrounding data releases. On one side, there are many strong reasons pulling to release data to other parties: business factors, freedom of information rules, and scientific sharing agreements. On the other side, concerns about individual privacy pull back, and seek to limit releases. Privacy technologies such as differential privacy have been proposed to resolve this deadlock, and there has been much study of how to perform private data release of data in various forms. The focus of such works has been largely on the data owner: what process should they apply to ensure that the released data preserves privacy whilst still capturing the input data distribution accurately. Almost no attention has been paid to the needs of the data user, who wants to make use of the released data within their existing suite of tools and data. The difficulty of making use of data releases is a major stumbling block for the widespread adoption of data privacy technologies. In this paper, instead of proposing new privacy mechanisms for data publishing, we consider the whole data release process, from the data owner to the data user. We lay out a set of principles for privacy tool design that highlights the requirements for interoperability, extensibility and scalability. We put these into practice with UMicS, an end-to-end prototype system to control the release and use of private data. An overarching tenet is that it should be possible to integrate the released data into the data user's systems with the minimum of change and cost. We describe how to instantiate UMicS in a variety of usage scenarios. We show how using data modeling techniques from machine learning can improve the utility, in particular when combined with background knowledge that the data user may possess. We implement UMicS, and evaluate it over a selection of data sets and release cases. We see that UMicS allows for very effective use of released data, while upholding our privacy principles.",
keywords = "Data release, Differential privacy",
author = "Graham Cormode and Entong Shen and Xi Gong and Ting Yu and Procopiuc, {Cecilia M.} and Divesh Srivastava",
year = "2013",
month = "12",
day = "11",
doi = "10.1145/2505515.2505737",
language = "English",
isbn = "9781450322638",
pages = "2255--2260",
booktitle = "International Conference on Information and Knowledge Management, Proceedings",

}

TY - GEN

T1 - UMicS

T2 - From anonymized data to usable MicroData

AU - Cormode, Graham

AU - Shen, Entong

AU - Gong, Xi

AU - Yu, Ting

AU - Procopiuc, Cecilia M.

AU - Srivastava, Divesh

PY - 2013/12/11

Y1 - 2013/12/11

N2 - There is currently a tug-of-war going on surrounding data releases. On one side, there are many strong reasons pulling to release data to other parties: business factors, freedom of information rules, and scientific sharing agreements. On the other side, concerns about individual privacy pull back, and seek to limit releases. Privacy technologies such as differential privacy have been proposed to resolve this deadlock, and there has been much study of how to perform private data release of data in various forms. The focus of such works has been largely on the data owner: what process should they apply to ensure that the released data preserves privacy whilst still capturing the input data distribution accurately. Almost no attention has been paid to the needs of the data user, who wants to make use of the released data within their existing suite of tools and data. The difficulty of making use of data releases is a major stumbling block for the widespread adoption of data privacy technologies. In this paper, instead of proposing new privacy mechanisms for data publishing, we consider the whole data release process, from the data owner to the data user. We lay out a set of principles for privacy tool design that highlights the requirements for interoperability, extensibility and scalability. We put these into practice with UMicS, an end-to-end prototype system to control the release and use of private data. An overarching tenet is that it should be possible to integrate the released data into the data user's systems with the minimum of change and cost. We describe how to instantiate UMicS in a variety of usage scenarios. We show how using data modeling techniques from machine learning can improve the utility, in particular when combined with background knowledge that the data user may possess. We implement UMicS, and evaluate it over a selection of data sets and release cases. We see that UMicS allows for very effective use of released data, while upholding our privacy principles.

AB - There is currently a tug-of-war going on surrounding data releases. On one side, there are many strong reasons pulling to release data to other parties: business factors, freedom of information rules, and scientific sharing agreements. On the other side, concerns about individual privacy pull back, and seek to limit releases. Privacy technologies such as differential privacy have been proposed to resolve this deadlock, and there has been much study of how to perform private data release of data in various forms. The focus of such works has been largely on the data owner: what process should they apply to ensure that the released data preserves privacy whilst still capturing the input data distribution accurately. Almost no attention has been paid to the needs of the data user, who wants to make use of the released data within their existing suite of tools and data. The difficulty of making use of data releases is a major stumbling block for the widespread adoption of data privacy technologies. In this paper, instead of proposing new privacy mechanisms for data publishing, we consider the whole data release process, from the data owner to the data user. We lay out a set of principles for privacy tool design that highlights the requirements for interoperability, extensibility and scalability. We put these into practice with UMicS, an end-to-end prototype system to control the release and use of private data. An overarching tenet is that it should be possible to integrate the released data into the data user's systems with the minimum of change and cost. We describe how to instantiate UMicS in a variety of usage scenarios. We show how using data modeling techniques from machine learning can improve the utility, in particular when combined with background knowledge that the data user may possess. We implement UMicS, and evaluate it over a selection of data sets and release cases. We see that UMicS allows for very effective use of released data, while upholding our privacy principles.

KW - Data release

KW - Differential privacy

UR - http://www.scopus.com/inward/record.url?scp=84889580037&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84889580037&partnerID=8YFLogxK

U2 - 10.1145/2505515.2505737

DO - 10.1145/2505515.2505737

M3 - Conference contribution

SN - 9781450322638

SP - 2255

EP - 2260

BT - International Conference on Information and Knowledge Management, Proceedings

ER -