Privtrie: effective frequent term discovery under local differential privacy

Ning Wang, Xiaokui Xiao, Yin Yang, Ta Duy Hoang, Hyejin Shin, Junbum Shin, Ge Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

A mobile operating system often needs to collect frequent new terms from users in order to build and maintain a comprehensive dictionary. Collecting keyboard usage data, however, raises privacy concerns. Local differential privacy (LDP) has been established as a strong privacy standard for collecting sensitive information from users. Currently, the best known solution for LDP-compliant frequent term discovery transforms the problem into collecting n-grams under LDP, and subsequently reconstructs terms from the collected n-grams by modelling the latter into a graph, and identifying cliques on this graph. Because the transformed problem (i.e., collecting n-grams) is very different from the original one (discovering frequent terms), the end result has poor utility. Further, this method is also rather expensive due to clique computation on a large graph. In this paper we tackle the problem head on: our proposal, PrivTrie, directly collects frequent terms from users by iteratively constructing a trie under LDP. While the methodology of building a trie is an obvious choice, obtaining an accurate trie under LDP is highly challenging. PrivTrie achieves this with a novel adaptive approach that conserves privacy budget by building internal nodes of the trie with the lowest level of accuracy necessary. Experiments using real datasets confirm that PrivTrie achieves high accuracy on common privacy levels, and consistently outperforms all previous methods.

Original languageEnglish
Title of host publicationProceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages821-832
Number of pages12
ISBN (Electronic)9781538655207
DOIs
Publication statusPublished - 24 Oct 2018
Event34th IEEE International Conference on Data Engineering, ICDE 2018 - Paris, France
Duration: 16 Apr 201819 Apr 2018

Other

Other34th IEEE International Conference on Data Engineering, ICDE 2018
CountryFrance
CityParis
Period16/4/1819/4/18

Fingerprint

Glossaries
Experiments
Privacy
Graph

Keywords

  • Frequent term
  • local differential privacy
  • Trie

ASJC Scopus subject areas

  • Information Systems
  • Information Systems and Management
  • Hardware and Architecture

Cite this

Wang, N., Xiao, X., Yang, Y., Hoang, T. D., Shin, H., Shin, J., & Yu, G. (2018). Privtrie: effective frequent term discovery under local differential privacy. In Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018 (pp. 821-832). [8509300] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDE.2018.00079

Privtrie : effective frequent term discovery under local differential privacy. / Wang, Ning; Xiao, Xiaokui; Yang, Yin; Hoang, Ta Duy; Shin, Hyejin; Shin, Junbum; Yu, Ge.

Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 821-832 8509300.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, N, Xiao, X, Yang, Y, Hoang, TD, Shin, H, Shin, J & Yu, G 2018, Privtrie: effective frequent term discovery under local differential privacy. in Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018., 8509300, Institute of Electrical and Electronics Engineers Inc., pp. 821-832, 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, 16/4/18. https://doi.org/10.1109/ICDE.2018.00079
Wang N, Xiao X, Yang Y, Hoang TD, Shin H, Shin J et al. Privtrie: effective frequent term discovery under local differential privacy. In Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 821-832. 8509300 https://doi.org/10.1109/ICDE.2018.00079
Wang, Ning ; Xiao, Xiaokui ; Yang, Yin ; Hoang, Ta Duy ; Shin, Hyejin ; Shin, Junbum ; Yu, Ge. / Privtrie : effective frequent term discovery under local differential privacy. Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 821-832
@inproceedings{384295ed64ef4899977f86b068247c7d,
title = "Privtrie: effective frequent term discovery under local differential privacy",
abstract = "A mobile operating system often needs to collect frequent new terms from users in order to build and maintain a comprehensive dictionary. Collecting keyboard usage data, however, raises privacy concerns. Local differential privacy (LDP) has been established as a strong privacy standard for collecting sensitive information from users. Currently, the best known solution for LDP-compliant frequent term discovery transforms the problem into collecting n-grams under LDP, and subsequently reconstructs terms from the collected n-grams by modelling the latter into a graph, and identifying cliques on this graph. Because the transformed problem (i.e., collecting n-grams) is very different from the original one (discovering frequent terms), the end result has poor utility. Further, this method is also rather expensive due to clique computation on a large graph. In this paper we tackle the problem head on: our proposal, PrivTrie, directly collects frequent terms from users by iteratively constructing a trie under LDP. While the methodology of building a trie is an obvious choice, obtaining an accurate trie under LDP is highly challenging. PrivTrie achieves this with a novel adaptive approach that conserves privacy budget by building internal nodes of the trie with the lowest level of accuracy necessary. Experiments using real datasets confirm that PrivTrie achieves high accuracy on common privacy levels, and consistently outperforms all previous methods.",
keywords = "Frequent term, local differential privacy, Trie",
author = "Ning Wang and Xiaokui Xiao and Yin Yang and Hoang, {Ta Duy} and Hyejin Shin and Junbum Shin and Ge Yu",
year = "2018",
month = "10",
day = "24",
doi = "10.1109/ICDE.2018.00079",
language = "English",
pages = "821--832",
booktitle = "Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Privtrie

T2 - effective frequent term discovery under local differential privacy

AU - Wang, Ning

AU - Xiao, Xiaokui

AU - Yang, Yin

AU - Hoang, Ta Duy

AU - Shin, Hyejin

AU - Shin, Junbum

AU - Yu, Ge

PY - 2018/10/24

Y1 - 2018/10/24

N2 - A mobile operating system often needs to collect frequent new terms from users in order to build and maintain a comprehensive dictionary. Collecting keyboard usage data, however, raises privacy concerns. Local differential privacy (LDP) has been established as a strong privacy standard for collecting sensitive information from users. Currently, the best known solution for LDP-compliant frequent term discovery transforms the problem into collecting n-grams under LDP, and subsequently reconstructs terms from the collected n-grams by modelling the latter into a graph, and identifying cliques on this graph. Because the transformed problem (i.e., collecting n-grams) is very different from the original one (discovering frequent terms), the end result has poor utility. Further, this method is also rather expensive due to clique computation on a large graph. In this paper we tackle the problem head on: our proposal, PrivTrie, directly collects frequent terms from users by iteratively constructing a trie under LDP. While the methodology of building a trie is an obvious choice, obtaining an accurate trie under LDP is highly challenging. PrivTrie achieves this with a novel adaptive approach that conserves privacy budget by building internal nodes of the trie with the lowest level of accuracy necessary. Experiments using real datasets confirm that PrivTrie achieves high accuracy on common privacy levels, and consistently outperforms all previous methods.

AB - A mobile operating system often needs to collect frequent new terms from users in order to build and maintain a comprehensive dictionary. Collecting keyboard usage data, however, raises privacy concerns. Local differential privacy (LDP) has been established as a strong privacy standard for collecting sensitive information from users. Currently, the best known solution for LDP-compliant frequent term discovery transforms the problem into collecting n-grams under LDP, and subsequently reconstructs terms from the collected n-grams by modelling the latter into a graph, and identifying cliques on this graph. Because the transformed problem (i.e., collecting n-grams) is very different from the original one (discovering frequent terms), the end result has poor utility. Further, this method is also rather expensive due to clique computation on a large graph. In this paper we tackle the problem head on: our proposal, PrivTrie, directly collects frequent terms from users by iteratively constructing a trie under LDP. While the methodology of building a trie is an obvious choice, obtaining an accurate trie under LDP is highly challenging. PrivTrie achieves this with a novel adaptive approach that conserves privacy budget by building internal nodes of the trie with the lowest level of accuracy necessary. Experiments using real datasets confirm that PrivTrie achieves high accuracy on common privacy levels, and consistently outperforms all previous methods.

KW - Frequent term

KW - local differential privacy

KW - Trie

UR - http://www.scopus.com/inward/record.url?scp=85057076327&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85057076327&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2018.00079

DO - 10.1109/ICDE.2018.00079

M3 - Conference contribution

AN - SCOPUS:85057076327

SP - 821

EP - 832

BT - Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -