PrivGene

Differentially private model fitting using genetic algorithms

Jun Zhang, Xiaokui Xiao, Yin Yang, Zhenjie Zhang, Marianne Winslett

Research output: Chapter in Book/Report/Conference proceedingConference contribution

37 Citations (Scopus)

Abstract

ε-differential privacy is rapidly emerging as the state-of-the-art scheme for protecting individuals' privacy in published analysis results over sensitive data. The main idea is to perform random perturbations on the analysis results, such that any individual's presence in the data has negligible impact on the randomized results. This paper focuses on analysis tasks that involve model fitting, i.e., finding the parameters of a statistical model that best fit the dataset. For such tasks, the quality of the differentially private results depends upon both the effectiveness of the model fitting algorithm, and the amount of perturbations required to satisfy the privacy guarantees. Most previous studies start from a state-of-the-art, non-private model fitting algorithm, and develop a differentially private version. Unfortunately, many model fitting algorithms require intensive perturbations to satisfy ε-differential privacy, leading to poor overall result quality. Motivated by this, we propose PrivGene, a general-purpose differentially private model fitting solution based on genetic algorithms (GA). PrivGene needs significantly less perturbations than previous methods, and it achieves higher overall result quality, even for model fitting tasks where GA is not the first choice without privacy considerations. Further, PrivGene performs the random perturbations using a novel technique called the enhanced exponential mechanism, which improves over the exponential mechanism [26] by exploiting the special properties of model fitting tasks. As case studies, we apply PrivGene to three common analysis tasks involving model fitting: logistic regression, SVM classification, and k-means clustering. Extensive experiments using real data confirm the high result quality of PrivGene, and its superiority over existing methods.

Original languageEnglish
Title of host publicationSIGMOD 2013 - International Conference on Management of Data
Pages665-676
Number of pages12
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013 - New York, NY, United States
Duration: 22 Jun 201327 Jun 2013

Other

Other2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013
CountryUnited States
CityNew York, NY
Period22/6/1327/6/13

Fingerprint

Genetic algorithms
Logistics
Experiments

Keywords

  • Differential privacy
  • Genetic algorithms
  • Model fitting

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Zhang, J., Xiao, X., Yang, Y., Zhang, Z., & Winslett, M. (2013). PrivGene: Differentially private model fitting using genetic algorithms. In SIGMOD 2013 - International Conference on Management of Data (pp. 665-676) https://doi.org/10.1145/2463676.2465330

PrivGene : Differentially private model fitting using genetic algorithms. / Zhang, Jun; Xiao, Xiaokui; Yang, Yin; Zhang, Zhenjie; Winslett, Marianne.

SIGMOD 2013 - International Conference on Management of Data. 2013. p. 665-676.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhang, J, Xiao, X, Yang, Y, Zhang, Z & Winslett, M 2013, PrivGene: Differentially private model fitting using genetic algorithms. in SIGMOD 2013 - International Conference on Management of Data. pp. 665-676, 2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013, New York, NY, United States, 22/6/13. https://doi.org/10.1145/2463676.2465330
Zhang J, Xiao X, Yang Y, Zhang Z, Winslett M. PrivGene: Differentially private model fitting using genetic algorithms. In SIGMOD 2013 - International Conference on Management of Data. 2013. p. 665-676 https://doi.org/10.1145/2463676.2465330
Zhang, Jun ; Xiao, Xiaokui ; Yang, Yin ; Zhang, Zhenjie ; Winslett, Marianne. / PrivGene : Differentially private model fitting using genetic algorithms. SIGMOD 2013 - International Conference on Management of Data. 2013. pp. 665-676
@inproceedings{1abe2821f37441b9a86ab4a4e05938ae,
title = "PrivGene: Differentially private model fitting using genetic algorithms",
abstract = "ε-differential privacy is rapidly emerging as the state-of-the-art scheme for protecting individuals' privacy in published analysis results over sensitive data. The main idea is to perform random perturbations on the analysis results, such that any individual's presence in the data has negligible impact on the randomized results. This paper focuses on analysis tasks that involve model fitting, i.e., finding the parameters of a statistical model that best fit the dataset. For such tasks, the quality of the differentially private results depends upon both the effectiveness of the model fitting algorithm, and the amount of perturbations required to satisfy the privacy guarantees. Most previous studies start from a state-of-the-art, non-private model fitting algorithm, and develop a differentially private version. Unfortunately, many model fitting algorithms require intensive perturbations to satisfy ε-differential privacy, leading to poor overall result quality. Motivated by this, we propose PrivGene, a general-purpose differentially private model fitting solution based on genetic algorithms (GA). PrivGene needs significantly less perturbations than previous methods, and it achieves higher overall result quality, even for model fitting tasks where GA is not the first choice without privacy considerations. Further, PrivGene performs the random perturbations using a novel technique called the enhanced exponential mechanism, which improves over the exponential mechanism [26] by exploiting the special properties of model fitting tasks. As case studies, we apply PrivGene to three common analysis tasks involving model fitting: logistic regression, SVM classification, and k-means clustering. Extensive experiments using real data confirm the high result quality of PrivGene, and its superiority over existing methods.",
keywords = "Differential privacy, Genetic algorithms, Model fitting",
author = "Jun Zhang and Xiaokui Xiao and Yin Yang and Zhenjie Zhang and Marianne Winslett",
year = "2013",
doi = "10.1145/2463676.2465330",
language = "English",
isbn = "9781450320375",
pages = "665--676",
booktitle = "SIGMOD 2013 - International Conference on Management of Data",

}

TY - GEN

T1 - PrivGene

T2 - Differentially private model fitting using genetic algorithms

AU - Zhang, Jun

AU - Xiao, Xiaokui

AU - Yang, Yin

AU - Zhang, Zhenjie

AU - Winslett, Marianne

PY - 2013

Y1 - 2013

N2 - ε-differential privacy is rapidly emerging as the state-of-the-art scheme for protecting individuals' privacy in published analysis results over sensitive data. The main idea is to perform random perturbations on the analysis results, such that any individual's presence in the data has negligible impact on the randomized results. This paper focuses on analysis tasks that involve model fitting, i.e., finding the parameters of a statistical model that best fit the dataset. For such tasks, the quality of the differentially private results depends upon both the effectiveness of the model fitting algorithm, and the amount of perturbations required to satisfy the privacy guarantees. Most previous studies start from a state-of-the-art, non-private model fitting algorithm, and develop a differentially private version. Unfortunately, many model fitting algorithms require intensive perturbations to satisfy ε-differential privacy, leading to poor overall result quality. Motivated by this, we propose PrivGene, a general-purpose differentially private model fitting solution based on genetic algorithms (GA). PrivGene needs significantly less perturbations than previous methods, and it achieves higher overall result quality, even for model fitting tasks where GA is not the first choice without privacy considerations. Further, PrivGene performs the random perturbations using a novel technique called the enhanced exponential mechanism, which improves over the exponential mechanism [26] by exploiting the special properties of model fitting tasks. As case studies, we apply PrivGene to three common analysis tasks involving model fitting: logistic regression, SVM classification, and k-means clustering. Extensive experiments using real data confirm the high result quality of PrivGene, and its superiority over existing methods.

AB - ε-differential privacy is rapidly emerging as the state-of-the-art scheme for protecting individuals' privacy in published analysis results over sensitive data. The main idea is to perform random perturbations on the analysis results, such that any individual's presence in the data has negligible impact on the randomized results. This paper focuses on analysis tasks that involve model fitting, i.e., finding the parameters of a statistical model that best fit the dataset. For such tasks, the quality of the differentially private results depends upon both the effectiveness of the model fitting algorithm, and the amount of perturbations required to satisfy the privacy guarantees. Most previous studies start from a state-of-the-art, non-private model fitting algorithm, and develop a differentially private version. Unfortunately, many model fitting algorithms require intensive perturbations to satisfy ε-differential privacy, leading to poor overall result quality. Motivated by this, we propose PrivGene, a general-purpose differentially private model fitting solution based on genetic algorithms (GA). PrivGene needs significantly less perturbations than previous methods, and it achieves higher overall result quality, even for model fitting tasks where GA is not the first choice without privacy considerations. Further, PrivGene performs the random perturbations using a novel technique called the enhanced exponential mechanism, which improves over the exponential mechanism [26] by exploiting the special properties of model fitting tasks. As case studies, we apply PrivGene to three common analysis tasks involving model fitting: logistic regression, SVM classification, and k-means clustering. Extensive experiments using real data confirm the high result quality of PrivGene, and its superiority over existing methods.

KW - Differential privacy

KW - Genetic algorithms

KW - Model fitting

UR - http://www.scopus.com/inward/record.url?scp=84880547850&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84880547850&partnerID=8YFLogxK

U2 - 10.1145/2463676.2465330

DO - 10.1145/2463676.2465330

M3 - Conference contribution

SN - 9781450320375

SP - 665

EP - 676

BT - SIGMOD 2013 - International Conference on Management of Data

ER -