Collecting and analyzing multidimensional data with local differential privacy

Ning Wang, Xiaokui Xiao, Yin Yang, Jun Zhao, Siu Cheung Hui, Hyejin Shin, Junbum Shin, Ge Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Local differential privacy (LDP) is a recently proposed privacy standard for collecting and analyzing data, which has been used, e.g., in the Chrome browser, iOS and macOS. In LDP, each user perturbs her information locally, and only sends the randomized version to an aggregator who performs analyses, which protects both the users and the aggregator against private information leaks. Although LDP has attracted much research attention in recent years, the majority of existing work focuses on applying LDP to complex data and/or analysis tasks. In this paper, we point out that the fundamental problem of collecting multidimensional data under LDP has not been addressed sufficiently, and there remains much room for improvement even for basic tasks such as computing the mean value over a single numeric attribute under LDP. Motivated by this, we first propose novel LDP mechanisms for collecting a numeric attribute, whose accuracy is at least no worse (and usually better) than existing solutions in terms of worst-case noise variance. Then, we extend these mechanisms to multidimensional data that can contain both numeric and categorical attributes, where our mechanisms always outperform existing solutions regarding worst-case noise variance. As a case study, we apply our solutions to build an LDP-compliant stochastic gradient descent algorithm (SGD), which powers many important machine learning tasks. Experiments using real datasets confirm the effectiveness of our methods, and their advantages over existing solutions.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019
PublisherIEEE Computer Society
Pages638-649
Number of pages12
ISBN (Electronic)9781538674741
DOIs
Publication statusPublished - 1 Apr 2019
Event35th IEEE International Conference on Data Engineering, ICDE 2019 - Macau, China
Duration: 8 Apr 201911 Apr 2019

Publication series

NameProceedings - International Conference on Data Engineering
Volume2019-April
ISSN (Print)1084-4627

Conference

Conference35th IEEE International Conference on Data Engineering, ICDE 2019
CountryChina
CityMacau
Period8/4/1911/4/19

Fingerprint

Learning systems
Experiments

Keywords

  • Local differential privacy
  • Multidimensional data
  • Stochastic gradient descent

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Cite this

Wang, N., Xiao, X., Yang, Y., Zhao, J., Hui, S. C., Shin, H., ... Yu, G. (2019). Collecting and analyzing multidimensional data with local differential privacy. In Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019 (pp. 638-649). [8731512] (Proceedings - International Conference on Data Engineering; Vol. 2019-April). IEEE Computer Society. https://doi.org/10.1109/ICDE.2019.00063

Collecting and analyzing multidimensional data with local differential privacy. / Wang, Ning; Xiao, Xiaokui; Yang, Yin; Zhao, Jun; Hui, Siu Cheung; Shin, Hyejin; Shin, Junbum; Yu, Ge.

Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019. IEEE Computer Society, 2019. p. 638-649 8731512 (Proceedings - International Conference on Data Engineering; Vol. 2019-April).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, N, Xiao, X, Yang, Y, Zhao, J, Hui, SC, Shin, H, Shin, J & Yu, G 2019, Collecting and analyzing multidimensional data with local differential privacy. in Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019., 8731512, Proceedings - International Conference on Data Engineering, vol. 2019-April, IEEE Computer Society, pp. 638-649, 35th IEEE International Conference on Data Engineering, ICDE 2019, Macau, China, 8/4/19. https://doi.org/10.1109/ICDE.2019.00063
Wang N, Xiao X, Yang Y, Zhao J, Hui SC, Shin H et al. Collecting and analyzing multidimensional data with local differential privacy. In Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019. IEEE Computer Society. 2019. p. 638-649. 8731512. (Proceedings - International Conference on Data Engineering). https://doi.org/10.1109/ICDE.2019.00063
Wang, Ning ; Xiao, Xiaokui ; Yang, Yin ; Zhao, Jun ; Hui, Siu Cheung ; Shin, Hyejin ; Shin, Junbum ; Yu, Ge. / Collecting and analyzing multidimensional data with local differential privacy. Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019. IEEE Computer Society, 2019. pp. 638-649 (Proceedings - International Conference on Data Engineering).
@inproceedings{9ea9ec42bf1949caa1068f841bcb28a8,
title = "Collecting and analyzing multidimensional data with local differential privacy",
abstract = "Local differential privacy (LDP) is a recently proposed privacy standard for collecting and analyzing data, which has been used, e.g., in the Chrome browser, iOS and macOS. In LDP, each user perturbs her information locally, and only sends the randomized version to an aggregator who performs analyses, which protects both the users and the aggregator against private information leaks. Although LDP has attracted much research attention in recent years, the majority of existing work focuses on applying LDP to complex data and/or analysis tasks. In this paper, we point out that the fundamental problem of collecting multidimensional data under LDP has not been addressed sufficiently, and there remains much room for improvement even for basic tasks such as computing the mean value over a single numeric attribute under LDP. Motivated by this, we first propose novel LDP mechanisms for collecting a numeric attribute, whose accuracy is at least no worse (and usually better) than existing solutions in terms of worst-case noise variance. Then, we extend these mechanisms to multidimensional data that can contain both numeric and categorical attributes, where our mechanisms always outperform existing solutions regarding worst-case noise variance. As a case study, we apply our solutions to build an LDP-compliant stochastic gradient descent algorithm (SGD), which powers many important machine learning tasks. Experiments using real datasets confirm the effectiveness of our methods, and their advantages over existing solutions.",
keywords = "Local differential privacy, Multidimensional data, Stochastic gradient descent",
author = "Ning Wang and Xiaokui Xiao and Yin Yang and Jun Zhao and Hui, {Siu Cheung} and Hyejin Shin and Junbum Shin and Ge Yu",
year = "2019",
month = "4",
day = "1",
doi = "10.1109/ICDE.2019.00063",
language = "English",
series = "Proceedings - International Conference on Data Engineering",
publisher = "IEEE Computer Society",
pages = "638--649",
booktitle = "Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019",

}

TY - GEN

T1 - Collecting and analyzing multidimensional data with local differential privacy

AU - Wang, Ning

AU - Xiao, Xiaokui

AU - Yang, Yin

AU - Zhao, Jun

AU - Hui, Siu Cheung

AU - Shin, Hyejin

AU - Shin, Junbum

AU - Yu, Ge

PY - 2019/4/1

Y1 - 2019/4/1

N2 - Local differential privacy (LDP) is a recently proposed privacy standard for collecting and analyzing data, which has been used, e.g., in the Chrome browser, iOS and macOS. In LDP, each user perturbs her information locally, and only sends the randomized version to an aggregator who performs analyses, which protects both the users and the aggregator against private information leaks. Although LDP has attracted much research attention in recent years, the majority of existing work focuses on applying LDP to complex data and/or analysis tasks. In this paper, we point out that the fundamental problem of collecting multidimensional data under LDP has not been addressed sufficiently, and there remains much room for improvement even for basic tasks such as computing the mean value over a single numeric attribute under LDP. Motivated by this, we first propose novel LDP mechanisms for collecting a numeric attribute, whose accuracy is at least no worse (and usually better) than existing solutions in terms of worst-case noise variance. Then, we extend these mechanisms to multidimensional data that can contain both numeric and categorical attributes, where our mechanisms always outperform existing solutions regarding worst-case noise variance. As a case study, we apply our solutions to build an LDP-compliant stochastic gradient descent algorithm (SGD), which powers many important machine learning tasks. Experiments using real datasets confirm the effectiveness of our methods, and their advantages over existing solutions.

AB - Local differential privacy (LDP) is a recently proposed privacy standard for collecting and analyzing data, which has been used, e.g., in the Chrome browser, iOS and macOS. In LDP, each user perturbs her information locally, and only sends the randomized version to an aggregator who performs analyses, which protects both the users and the aggregator against private information leaks. Although LDP has attracted much research attention in recent years, the majority of existing work focuses on applying LDP to complex data and/or analysis tasks. In this paper, we point out that the fundamental problem of collecting multidimensional data under LDP has not been addressed sufficiently, and there remains much room for improvement even for basic tasks such as computing the mean value over a single numeric attribute under LDP. Motivated by this, we first propose novel LDP mechanisms for collecting a numeric attribute, whose accuracy is at least no worse (and usually better) than existing solutions in terms of worst-case noise variance. Then, we extend these mechanisms to multidimensional data that can contain both numeric and categorical attributes, where our mechanisms always outperform existing solutions regarding worst-case noise variance. As a case study, we apply our solutions to build an LDP-compliant stochastic gradient descent algorithm (SGD), which powers many important machine learning tasks. Experiments using real datasets confirm the effectiveness of our methods, and their advantages over existing solutions.

KW - Local differential privacy

KW - Multidimensional data

KW - Stochastic gradient descent

UR - http://www.scopus.com/inward/record.url?scp=85067954711&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067954711&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2019.00063

DO - 10.1109/ICDE.2019.00063

M3 - Conference contribution

AN - SCOPUS:85067954711

T3 - Proceedings - International Conference on Data Engineering

SP - 638

EP - 649

BT - Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019

PB - IEEE Computer Society

ER -