Privacy preserving decision tree learning over multiple parties

F. Emekci, O. D. Sahin, D. Agrawal, A. El Abbadi

Research output: Contribution to journalArticle

67 Citations (Scopus)

Abstract

Data mining over multiple data sources has emerged as an important practical problem with applications in different areas such as data streams, data-warehouses, and bioinformatics. Although the data sources are willing to run data mining algorithms in these cases, they do not want to reveal any extra information about their data to other sources due to legal or competition concerns. One possible solution to this problem is to use cryptographic methods. However, the computation and communication complexity of such solutions render them impractical when a large number of data sources are involved. In this paper, we consider a scenario where multiple data sources are willing to run data mining algorithms over the union of their data as long as each data source is guaranteed that its information that does not pertain to another data source will not be revealed. We focus on the classification problem in particular and present an efficient algorithm for building a decision tree over an arbitrary number of distributed sources in a privacy preserving manner using the ID3 algorithm.

Original languageEnglish
Pages (from-to)348-361
Number of pages14
JournalData and Knowledge Engineering
Volume63
Issue number2
DOIs
Publication statusPublished - 1 Nov 2007
Externally publishedYes

Fingerprint

Decision trees
Data mining
Data warehouses
Bioinformatics
Decision tree
Privacy preserving
Data sources
Communication

Keywords

  • Data mining
  • Data privacy and security
  • ID3

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Privacy preserving decision tree learning over multiple parties. / Emekci, F.; Sahin, O. D.; Agrawal, D.; El Abbadi, A.

In: Data and Knowledge Engineering, Vol. 63, No. 2, 01.11.2007, p. 348-361.

Research output: Contribution to journalArticle

Emekci, F. ; Sahin, O. D. ; Agrawal, D. ; El Abbadi, A. / Privacy preserving decision tree learning over multiple parties. In: Data and Knowledge Engineering. 2007 ; Vol. 63, No. 2. pp. 348-361.
@article{9e8f0c797a0848c39c15e05d5d3dfd98,
title = "Privacy preserving decision tree learning over multiple parties",
abstract = "Data mining over multiple data sources has emerged as an important practical problem with applications in different areas such as data streams, data-warehouses, and bioinformatics. Although the data sources are willing to run data mining algorithms in these cases, they do not want to reveal any extra information about their data to other sources due to legal or competition concerns. One possible solution to this problem is to use cryptographic methods. However, the computation and communication complexity of such solutions render them impractical when a large number of data sources are involved. In this paper, we consider a scenario where multiple data sources are willing to run data mining algorithms over the union of their data as long as each data source is guaranteed that its information that does not pertain to another data source will not be revealed. We focus on the classification problem in particular and present an efficient algorithm for building a decision tree over an arbitrary number of distributed sources in a privacy preserving manner using the ID3 algorithm.",
keywords = "Data mining, Data privacy and security, ID3",
author = "F. Emekci and Sahin, {O. D.} and D. Agrawal and {El Abbadi}, A.",
year = "2007",
month = "11",
day = "1",
doi = "10.1016/j.datak.2007.02.004",
language = "English",
volume = "63",
pages = "348--361",
journal = "Data and Knowledge Engineering",
issn = "0169-023X",
publisher = "Elsevier",
number = "2",

}

TY - JOUR

T1 - Privacy preserving decision tree learning over multiple parties

AU - Emekci, F.

AU - Sahin, O. D.

AU - Agrawal, D.

AU - El Abbadi, A.

PY - 2007/11/1

Y1 - 2007/11/1

N2 - Data mining over multiple data sources has emerged as an important practical problem with applications in different areas such as data streams, data-warehouses, and bioinformatics. Although the data sources are willing to run data mining algorithms in these cases, they do not want to reveal any extra information about their data to other sources due to legal or competition concerns. One possible solution to this problem is to use cryptographic methods. However, the computation and communication complexity of such solutions render them impractical when a large number of data sources are involved. In this paper, we consider a scenario where multiple data sources are willing to run data mining algorithms over the union of their data as long as each data source is guaranteed that its information that does not pertain to another data source will not be revealed. We focus on the classification problem in particular and present an efficient algorithm for building a decision tree over an arbitrary number of distributed sources in a privacy preserving manner using the ID3 algorithm.

AB - Data mining over multiple data sources has emerged as an important practical problem with applications in different areas such as data streams, data-warehouses, and bioinformatics. Although the data sources are willing to run data mining algorithms in these cases, they do not want to reveal any extra information about their data to other sources due to legal or competition concerns. One possible solution to this problem is to use cryptographic methods. However, the computation and communication complexity of such solutions render them impractical when a large number of data sources are involved. In this paper, we consider a scenario where multiple data sources are willing to run data mining algorithms over the union of their data as long as each data source is guaranteed that its information that does not pertain to another data source will not be revealed. We focus on the classification problem in particular and present an efficient algorithm for building a decision tree over an arbitrary number of distributed sources in a privacy preserving manner using the ID3 algorithm.

KW - Data mining

KW - Data privacy and security

KW - ID3

UR - http://www.scopus.com/inward/record.url?scp=34447254993&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34447254993&partnerID=8YFLogxK

U2 - 10.1016/j.datak.2007.02.004

DO - 10.1016/j.datak.2007.02.004

M3 - Article

VL - 63

SP - 348

EP - 361

JO - Data and Knowledge Engineering

JF - Data and Knowledge Engineering

SN - 0169-023X

IS - 2

ER -