Privacy preserving decision tree learning over multiple parties

F. Emekci, O. D. Sahin, D. Agrawal, A. El Abbadi

Research output: Contribution to journalArticle

72 Citations (Scopus)

Abstract

Data mining over multiple data sources has emerged as an important practical problem with applications in different areas such as data streams, data-warehouses, and bioinformatics. Although the data sources are willing to run data mining algorithms in these cases, they do not want to reveal any extra information about their data to other sources due to legal or competition concerns. One possible solution to this problem is to use cryptographic methods. However, the computation and communication complexity of such solutions render them impractical when a large number of data sources are involved. In this paper, we consider a scenario where multiple data sources are willing to run data mining algorithms over the union of their data as long as each data source is guaranteed that its information that does not pertain to another data source will not be revealed. We focus on the classification problem in particular and present an efficient algorithm for building a decision tree over an arbitrary number of distributed sources in a privacy preserving manner using the ID3 algorithm.

Original languageEnglish
Pages (from-to)348-361
Number of pages14
JournalData and Knowledge Engineering
Volume63
Issue number2
DOIs
Publication statusPublished - 1 Nov 2007

    Fingerprint

Keywords

  • Data mining
  • Data privacy and security
  • ID3

ASJC Scopus subject areas

  • Information Systems and Management

Cite this