Distributed Cardinality Estimation of Set Operations with Differential Privacy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

In this paper we study the problem of estimating the cardinality of pairwise set operations (union and intersection) over sets possessed by different data owners, while preserving differential privacy. In our problem setting, a data owner could only communicate with an untrusted server, and thus have to perturb its set data for privacy protection before sharing them with the server. This problem setting is relevant to diverse applications in practice, including sensor-based traffic monitoring, cross-domain data integration, and combining findings from multiple surveys. To tackle this problem, we first adopt existing randomized response technique to perturb the bit vector (to achieve differential privacy) and develop tools which the server can use to derive the cardinality of set operations from the randomized bit vectors. However, the variance of the union/intersection estimator grows linearly with the universe (bit-vector) size which is impractical for large universes. To keep the variance low we in addition propose to resort to Bloom filters instead of high-dimensional bit vectors to share set data with the server. The key insight is that in spite of inevitable collisions in BF by keeping its size small we can bound the variance of the union/intersection cardinality estimators. Finally, we show that investing a small part of the privacy budget into reporting (obfuscated) set cardinality can further reduce the estimator errors for up to 20%. Our empirical analysis reveals the impact of various parameters including privacy budget and Bloom filter size on the overall accuracy of the approach and demonstrates the utility of the proposed solution.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE Symposium on Privacy-Aware Computing, PAC 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages37-48
Number of pages12
Volume2017-January
ISBN (Electronic)9781538610275
DOIs
Publication statusPublished - 4 Dec 2017
Event1st IEEE Symposium on Privacy-Aware Computing, PAC 2017 - Washington, United States
Duration: 1 Aug 20173 Aug 2017

Other

Other1st IEEE Symposium on Privacy-Aware Computing, PAC 2017
CountryUnited States
CityWashington
Period1/8/173/8/17

Fingerprint

Servers
Data integration
Monitoring
Sensors

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Safety, Risk, Reliability and Quality

Cite this

Stanojevic, R., Nabeel, M., & Yu, T. (2017). Distributed Cardinality Estimation of Set Operations with Differential Privacy. In Proceedings - 2017 IEEE Symposium on Privacy-Aware Computing, PAC 2017 (Vol. 2017-January, pp. 37-48). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/PAC.2017.43

Distributed Cardinality Estimation of Set Operations with Differential Privacy. / Stanojevic, Rade; Nabeel, Mohamed; Yu, Ting.

Proceedings - 2017 IEEE Symposium on Privacy-Aware Computing, PAC 2017. Vol. 2017-January Institute of Electrical and Electronics Engineers Inc., 2017. p. 37-48.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Stanojevic, R, Nabeel, M & Yu, T 2017, Distributed Cardinality Estimation of Set Operations with Differential Privacy. in Proceedings - 2017 IEEE Symposium on Privacy-Aware Computing, PAC 2017. vol. 2017-January, Institute of Electrical and Electronics Engineers Inc., pp. 37-48, 1st IEEE Symposium on Privacy-Aware Computing, PAC 2017, Washington, United States, 1/8/17. https://doi.org/10.1109/PAC.2017.43
Stanojevic R, Nabeel M, Yu T. Distributed Cardinality Estimation of Set Operations with Differential Privacy. In Proceedings - 2017 IEEE Symposium on Privacy-Aware Computing, PAC 2017. Vol. 2017-January. Institute of Electrical and Electronics Engineers Inc. 2017. p. 37-48 https://doi.org/10.1109/PAC.2017.43
Stanojevic, Rade ; Nabeel, Mohamed ; Yu, Ting. / Distributed Cardinality Estimation of Set Operations with Differential Privacy. Proceedings - 2017 IEEE Symposium on Privacy-Aware Computing, PAC 2017. Vol. 2017-January Institute of Electrical and Electronics Engineers Inc., 2017. pp. 37-48
@inproceedings{fb553ab758d4475a8aec185677e9d955,
title = "Distributed Cardinality Estimation of Set Operations with Differential Privacy",
abstract = "In this paper we study the problem of estimating the cardinality of pairwise set operations (union and intersection) over sets possessed by different data owners, while preserving differential privacy. In our problem setting, a data owner could only communicate with an untrusted server, and thus have to perturb its set data for privacy protection before sharing them with the server. This problem setting is relevant to diverse applications in practice, including sensor-based traffic monitoring, cross-domain data integration, and combining findings from multiple surveys. To tackle this problem, we first adopt existing randomized response technique to perturb the bit vector (to achieve differential privacy) and develop tools which the server can use to derive the cardinality of set operations from the randomized bit vectors. However, the variance of the union/intersection estimator grows linearly with the universe (bit-vector) size which is impractical for large universes. To keep the variance low we in addition propose to resort to Bloom filters instead of high-dimensional bit vectors to share set data with the server. The key insight is that in spite of inevitable collisions in BF by keeping its size small we can bound the variance of the union/intersection cardinality estimators. Finally, we show that investing a small part of the privacy budget into reporting (obfuscated) set cardinality can further reduce the estimator errors for up to 20{\%}. Our empirical analysis reveals the impact of various parameters including privacy budget and Bloom filter size on the overall accuracy of the approach and demonstrates the utility of the proposed solution.",
author = "Rade Stanojevic and Mohamed Nabeel and Ting Yu",
year = "2017",
month = "12",
day = "4",
doi = "10.1109/PAC.2017.43",
language = "English",
volume = "2017-January",
pages = "37--48",
booktitle = "Proceedings - 2017 IEEE Symposium on Privacy-Aware Computing, PAC 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Distributed Cardinality Estimation of Set Operations with Differential Privacy

AU - Stanojevic, Rade

AU - Nabeel, Mohamed

AU - Yu, Ting

PY - 2017/12/4

Y1 - 2017/12/4

N2 - In this paper we study the problem of estimating the cardinality of pairwise set operations (union and intersection) over sets possessed by different data owners, while preserving differential privacy. In our problem setting, a data owner could only communicate with an untrusted server, and thus have to perturb its set data for privacy protection before sharing them with the server. This problem setting is relevant to diverse applications in practice, including sensor-based traffic monitoring, cross-domain data integration, and combining findings from multiple surveys. To tackle this problem, we first adopt existing randomized response technique to perturb the bit vector (to achieve differential privacy) and develop tools which the server can use to derive the cardinality of set operations from the randomized bit vectors. However, the variance of the union/intersection estimator grows linearly with the universe (bit-vector) size which is impractical for large universes. To keep the variance low we in addition propose to resort to Bloom filters instead of high-dimensional bit vectors to share set data with the server. The key insight is that in spite of inevitable collisions in BF by keeping its size small we can bound the variance of the union/intersection cardinality estimators. Finally, we show that investing a small part of the privacy budget into reporting (obfuscated) set cardinality can further reduce the estimator errors for up to 20%. Our empirical analysis reveals the impact of various parameters including privacy budget and Bloom filter size on the overall accuracy of the approach and demonstrates the utility of the proposed solution.

AB - In this paper we study the problem of estimating the cardinality of pairwise set operations (union and intersection) over sets possessed by different data owners, while preserving differential privacy. In our problem setting, a data owner could only communicate with an untrusted server, and thus have to perturb its set data for privacy protection before sharing them with the server. This problem setting is relevant to diverse applications in practice, including sensor-based traffic monitoring, cross-domain data integration, and combining findings from multiple surveys. To tackle this problem, we first adopt existing randomized response technique to perturb the bit vector (to achieve differential privacy) and develop tools which the server can use to derive the cardinality of set operations from the randomized bit vectors. However, the variance of the union/intersection estimator grows linearly with the universe (bit-vector) size which is impractical for large universes. To keep the variance low we in addition propose to resort to Bloom filters instead of high-dimensional bit vectors to share set data with the server. The key insight is that in spite of inevitable collisions in BF by keeping its size small we can bound the variance of the union/intersection cardinality estimators. Finally, we show that investing a small part of the privacy budget into reporting (obfuscated) set cardinality can further reduce the estimator errors for up to 20%. Our empirical analysis reveals the impact of various parameters including privacy budget and Bloom filter size on the overall accuracy of the approach and demonstrates the utility of the proposed solution.

UR - http://www.scopus.com/inward/record.url?scp=85046546726&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046546726&partnerID=8YFLogxK

U2 - 10.1109/PAC.2017.43

DO - 10.1109/PAC.2017.43

M3 - Conference contribution

AN - SCOPUS:85046546726

VL - 2017-January

SP - 37

EP - 48

BT - Proceedings - 2017 IEEE Symposium on Privacy-Aware Computing, PAC 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -