Efficient and Practical Approach for Private Record Linkage

Mohamed Yakout, Mikhail J. Atallah, Ahmed Elmagarmid

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Record linkage is used to associate entities from multiple data sources. For example, two organizations contemplating a merger may want to know how common their customer bases are so that they may better assess the benefits of the merger. Another example is a database of people who are forbidden from a certain activity by regulators, may need to be compared to a list of people engaged in that activity. The autonomous entities who wish to carry out the record matching computation are often reluctant to fully share their data; they fear losing control over its subsequent dissemination and usage, or they want to insure privacy because the data is proprietary or confidential, and/or they are cautious simply because privacy laws forbid its disclosure or regulate the form of that disclosure. In such cases, the problem of carrying out the linkage computation without full data exchange has been called private record linkage. Previous private record linkage techniques have made use of a third party. We provide efficient techniques for private record linkage that improve on previous work in that (1) our techniques make no use of a third party, and (2) they achieve much better performance than previous schemes in terms of their execution time while maintaining acceptable quality of output compared to nonprivacy settings. Our protocol consists of two phases. The first phase primarily produces candidate record pairs for matching, by carrying out a very fast (but not accurate) matching between such pairs of records. The second phase is a novel protocol for efficiently computing distances between each candidate pair (without any expensive cryptographic operations such as modular exponentiations). Our experimental evaluation of our approach validates these claims. Categories and Subject Descriptors: H.2.0 [Database Management]: General—Security, integrity, and protection; H.3.4 [Information Storage and Retrieval]: Systems and Software—Performance evaluation (efficiency and effectiveness); K.6.4 [Management of Computing and Information Systems]: System Management—Quality assurance.

Original languageEnglish
Pages (from-to)1-28
Number of pages28
JournalJournal of Data and Information Quality
Volume3
Issue number3
DOIs
Publication statusPublished - 1 Aug 2012
Externally publishedYes

    Fingerprint

Keywords

  • Algorithms
  • Design
  • Experimentation
  • integration
  • linkage
  • privacy
  • private information retrieval
  • private linkage
  • Record linkage
  • secure scalar product

ASJC Scopus subject areas

  • Information Systems
  • Information Systems and Management

Cite this