Heavy hitter estimation over set-valued data with local differential privacy

Zhan Qin, Yin Yang, Ting Yu, Issa Khalil, Xiaokui Xiao, Kui Ren

Research output: Chapter in Book/Report/Conference proceedingConference contribution

65 Citations (Scopus)

Abstract

In local differential privacy (LDP), each user perturbs her data locally before sending the noisy data to a data collector. The latter then analyzes the data to obtain useful statistics. Unlike the setting of centralized differential privacy, in LDP the data collector never gains access to the exact values of sensitive data, which protects not only the privacy of data contributors but also the collector itself against the risk of potential data leakage. Existing LDP solutions in the literature are mostly limited to the case that each user possesses a tuple of numeric or categorical values, and the data collector computes basic statistics such as counts or mean values. To the best of our knowledge, no existing work tackles more complex data mining tasks such as heavy hitter discovery over set-valued data. In this paper, we present a systematic study of heavy hitter mining under LDP. We first review existing solutions, extend them to the heavy hitter estimation, and explain why their effectiveness is limited. We then propose LDPMiner, a two-phase mechanism for obtaining accurate heavy hitters with LDP. The main idea is to first gather a candidate set of heavy hitters using a portion of the privacy budget, and focus the remaining budget on refining the candidate set in a second phase, which is much more efficient budget-wise than obtaining the heavy hitters directly from the whole dataset. We provide both in-depth theoretical analysis and extensive experiments to compare LDPMiner against adaptations of previous solutions. The results show that LDPMiner significantly improves over existing methods. More importantly, LDPMiner successfully identifies the majority true heavy hitters in practical settings.

Original languageEnglish
Title of host publicationCCS 2016 - Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security
PublisherAssociation for Computing Machinery
Pages192-203
Number of pages12
Volume24-28-October-2016
ISBN (Electronic)9781450341394
DOIs
Publication statusPublished - 24 Oct 2016
Event23rd ACM Conference on Computer and Communications Security, CCS 2016 - Vienna, Austria
Duration: 24 Oct 201628 Oct 2016

Other

Other23rd ACM Conference on Computer and Communications Security, CCS 2016
CountryAustria
CityVienna
Period24/10/1628/10/16

Fingerprint

Statistics
Leakage (fluid)
Refining
Data mining
Experiments

Keywords

  • Heavy hitter
  • Local differential privacy

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications

Cite this

Qin, Z., Yang, Y., Yu, T., Khalil, I., Xiao, X., & Ren, K. (2016). Heavy hitter estimation over set-valued data with local differential privacy. In CCS 2016 - Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (Vol. 24-28-October-2016, pp. 192-203). Association for Computing Machinery. https://doi.org/10.1145/2976749.2978409

Heavy hitter estimation over set-valued data with local differential privacy. / Qin, Zhan; Yang, Yin; Yu, Ting; Khalil, Issa; Xiao, Xiaokui; Ren, Kui.

CCS 2016 - Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. Vol. 24-28-October-2016 Association for Computing Machinery, 2016. p. 192-203.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Qin, Z, Yang, Y, Yu, T, Khalil, I, Xiao, X & Ren, K 2016, Heavy hitter estimation over set-valued data with local differential privacy. in CCS 2016 - Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. vol. 24-28-October-2016, Association for Computing Machinery, pp. 192-203, 23rd ACM Conference on Computer and Communications Security, CCS 2016, Vienna, Austria, 24/10/16. https://doi.org/10.1145/2976749.2978409
Qin Z, Yang Y, Yu T, Khalil I, Xiao X, Ren K. Heavy hitter estimation over set-valued data with local differential privacy. In CCS 2016 - Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. Vol. 24-28-October-2016. Association for Computing Machinery. 2016. p. 192-203 https://doi.org/10.1145/2976749.2978409
Qin, Zhan ; Yang, Yin ; Yu, Ting ; Khalil, Issa ; Xiao, Xiaokui ; Ren, Kui. / Heavy hitter estimation over set-valued data with local differential privacy. CCS 2016 - Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. Vol. 24-28-October-2016 Association for Computing Machinery, 2016. pp. 192-203
@inproceedings{1c1de83240f74c06a39d5722f5bf3bcf,
title = "Heavy hitter estimation over set-valued data with local differential privacy",
abstract = "In local differential privacy (LDP), each user perturbs her data locally before sending the noisy data to a data collector. The latter then analyzes the data to obtain useful statistics. Unlike the setting of centralized differential privacy, in LDP the data collector never gains access to the exact values of sensitive data, which protects not only the privacy of data contributors but also the collector itself against the risk of potential data leakage. Existing LDP solutions in the literature are mostly limited to the case that each user possesses a tuple of numeric or categorical values, and the data collector computes basic statistics such as counts or mean values. To the best of our knowledge, no existing work tackles more complex data mining tasks such as heavy hitter discovery over set-valued data. In this paper, we present a systematic study of heavy hitter mining under LDP. We first review existing solutions, extend them to the heavy hitter estimation, and explain why their effectiveness is limited. We then propose LDPMiner, a two-phase mechanism for obtaining accurate heavy hitters with LDP. The main idea is to first gather a candidate set of heavy hitters using a portion of the privacy budget, and focus the remaining budget on refining the candidate set in a second phase, which is much more efficient budget-wise than obtaining the heavy hitters directly from the whole dataset. We provide both in-depth theoretical analysis and extensive experiments to compare LDPMiner against adaptations of previous solutions. The results show that LDPMiner significantly improves over existing methods. More importantly, LDPMiner successfully identifies the majority true heavy hitters in practical settings.",
keywords = "Heavy hitter, Local differential privacy",
author = "Zhan Qin and Yin Yang and Ting Yu and Issa Khalil and Xiaokui Xiao and Kui Ren",
year = "2016",
month = "10",
day = "24",
doi = "10.1145/2976749.2978409",
language = "English",
volume = "24-28-October-2016",
pages = "192--203",
booktitle = "CCS 2016 - Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - Heavy hitter estimation over set-valued data with local differential privacy

AU - Qin, Zhan

AU - Yang, Yin

AU - Yu, Ting

AU - Khalil, Issa

AU - Xiao, Xiaokui

AU - Ren, Kui

PY - 2016/10/24

Y1 - 2016/10/24

N2 - In local differential privacy (LDP), each user perturbs her data locally before sending the noisy data to a data collector. The latter then analyzes the data to obtain useful statistics. Unlike the setting of centralized differential privacy, in LDP the data collector never gains access to the exact values of sensitive data, which protects not only the privacy of data contributors but also the collector itself against the risk of potential data leakage. Existing LDP solutions in the literature are mostly limited to the case that each user possesses a tuple of numeric or categorical values, and the data collector computes basic statistics such as counts or mean values. To the best of our knowledge, no existing work tackles more complex data mining tasks such as heavy hitter discovery over set-valued data. In this paper, we present a systematic study of heavy hitter mining under LDP. We first review existing solutions, extend them to the heavy hitter estimation, and explain why their effectiveness is limited. We then propose LDPMiner, a two-phase mechanism for obtaining accurate heavy hitters with LDP. The main idea is to first gather a candidate set of heavy hitters using a portion of the privacy budget, and focus the remaining budget on refining the candidate set in a second phase, which is much more efficient budget-wise than obtaining the heavy hitters directly from the whole dataset. We provide both in-depth theoretical analysis and extensive experiments to compare LDPMiner against adaptations of previous solutions. The results show that LDPMiner significantly improves over existing methods. More importantly, LDPMiner successfully identifies the majority true heavy hitters in practical settings.

AB - In local differential privacy (LDP), each user perturbs her data locally before sending the noisy data to a data collector. The latter then analyzes the data to obtain useful statistics. Unlike the setting of centralized differential privacy, in LDP the data collector never gains access to the exact values of sensitive data, which protects not only the privacy of data contributors but also the collector itself against the risk of potential data leakage. Existing LDP solutions in the literature are mostly limited to the case that each user possesses a tuple of numeric or categorical values, and the data collector computes basic statistics such as counts or mean values. To the best of our knowledge, no existing work tackles more complex data mining tasks such as heavy hitter discovery over set-valued data. In this paper, we present a systematic study of heavy hitter mining under LDP. We first review existing solutions, extend them to the heavy hitter estimation, and explain why their effectiveness is limited. We then propose LDPMiner, a two-phase mechanism for obtaining accurate heavy hitters with LDP. The main idea is to first gather a candidate set of heavy hitters using a portion of the privacy budget, and focus the remaining budget on refining the candidate set in a second phase, which is much more efficient budget-wise than obtaining the heavy hitters directly from the whole dataset. We provide both in-depth theoretical analysis and extensive experiments to compare LDPMiner against adaptations of previous solutions. The results show that LDPMiner significantly improves over existing methods. More importantly, LDPMiner successfully identifies the majority true heavy hitters in practical settings.

KW - Heavy hitter

KW - Local differential privacy

UR - http://www.scopus.com/inward/record.url?scp=84995468307&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84995468307&partnerID=8YFLogxK

U2 - 10.1145/2976749.2978409

DO - 10.1145/2976749.2978409

M3 - Conference contribution

VL - 24-28-October-2016

SP - 192

EP - 203

BT - CCS 2016 - Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security

PB - Association for Computing Machinery

ER -