Efficient and Practical Approach for Private Record Linkage

Mohamed Yakout, Mikhail J. Atallah, Ahmed Elmagarmid

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Record linkage is used to associate entities from multiple data sources. For example, two organizations contemplating a merger may want to know how common their customer bases are so that they may better assess the benefits of the merger. Another example is a database of people who are forbidden from a certain activity by regulators, may need to be compared to a list of people engaged in that activity. The autonomous entities who wish to carry out the record matching computation are often reluctant to fully share their data; they fear losing control over its subsequent dissemination and usage, or they want to insure privacy because the data is proprietary or confidential, and/or they are cautious simply because privacy laws forbid its disclosure or regulate the form of that disclosure. In such cases, the problem of carrying out the linkage computation without full data exchange has been called private record linkage. Previous private record linkage techniques have made use of a third party. We provide efficient techniques for private record linkage that improve on previous work in that (1) our techniques make no use of a third party, and (2) they achieve much better performance than previous schemes in terms of their execution time while maintaining acceptable quality of output compared to nonprivacy settings. Our protocol consists of two phases. The first phase primarily produces candidate record pairs for matching, by carrying out a very fast (but not accurate) matching between such pairs of records. The second phase is a novel protocol for efficiently computing distances between each candidate pair (without any expensive cryptographic operations such as modular exponentiations). Our experimental evaluation of our approach validates these claims. Categories and Subject Descriptors: H.2.0 [Database Management]: General—Security, integrity, and protection; H.3.4 [Information Storage and Retrieval]: Systems and Software—Performance evaluation (efficiency and effectiveness); K.6.4 [Management of Computing and Information Systems]: System Management—Quality assurance.

Original languageEnglish
Pages (from-to)1-28
Number of pages28
JournalJournal of Data and Information Quality
Volume3
Issue number3
DOIs
Publication statusPublished - 1 Aug 2012
Externally publishedYes

Fingerprint

Information retrieval systems
Electronic data interchange
Information systems
Record linkage
Evaluation
Disclosure
Privacy
Mergers
Dissemination
Integrity
Data exchange
Database management
Assurance
Data sources
Data base
Linkage

Keywords

  • Algorithms
  • Design
  • Experimentation
  • integration
  • linkage
  • privacy
  • private information retrieval
  • private linkage
  • Record linkage
  • secure scalar product

ASJC Scopus subject areas

  • Information Systems
  • Information Systems and Management

Cite this

Efficient and Practical Approach for Private Record Linkage. / Yakout, Mohamed; Atallah, Mikhail J.; Elmagarmid, Ahmed.

In: Journal of Data and Information Quality, Vol. 3, No. 3, 01.08.2012, p. 1-28.

Research output: Contribution to journalArticle

Yakout, Mohamed ; Atallah, Mikhail J. ; Elmagarmid, Ahmed. / Efficient and Practical Approach for Private Record Linkage. In: Journal of Data and Information Quality. 2012 ; Vol. 3, No. 3. pp. 1-28.
@article{dd136d74a1624a669a4efb8b7cfc4d52,
title = "Efficient and Practical Approach for Private Record Linkage",
abstract = "Record linkage is used to associate entities from multiple data sources. For example, two organizations contemplating a merger may want to know how common their customer bases are so that they may better assess the benefits of the merger. Another example is a database of people who are forbidden from a certain activity by regulators, may need to be compared to a list of people engaged in that activity. The autonomous entities who wish to carry out the record matching computation are often reluctant to fully share their data; they fear losing control over its subsequent dissemination and usage, or they want to insure privacy because the data is proprietary or confidential, and/or they are cautious simply because privacy laws forbid its disclosure or regulate the form of that disclosure. In such cases, the problem of carrying out the linkage computation without full data exchange has been called private record linkage. Previous private record linkage techniques have made use of a third party. We provide efficient techniques for private record linkage that improve on previous work in that (1) our techniques make no use of a third party, and (2) they achieve much better performance than previous schemes in terms of their execution time while maintaining acceptable quality of output compared to nonprivacy settings. Our protocol consists of two phases. The first phase primarily produces candidate record pairs for matching, by carrying out a very fast (but not accurate) matching between such pairs of records. The second phase is a novel protocol for efficiently computing distances between each candidate pair (without any expensive cryptographic operations such as modular exponentiations). Our experimental evaluation of our approach validates these claims. Categories and Subject Descriptors: H.2.0 [Database Management]: General—Security, integrity, and protection; H.3.4 [Information Storage and Retrieval]: Systems and Software—Performance evaluation (efficiency and effectiveness); K.6.4 [Management of Computing and Information Systems]: System Management—Quality assurance.",
keywords = "Algorithms, Design, Experimentation, integration, linkage, privacy, private information retrieval, private linkage, Record linkage, secure scalar product",
author = "Mohamed Yakout and Atallah, {Mikhail J.} and Ahmed Elmagarmid",
year = "2012",
month = "8",
day = "1",
doi = "10.1145/2287714.2287715",
language = "English",
volume = "3",
pages = "1--28",
journal = "Journal of Data and Information Quality",
issn = "1936-1955",
publisher = "Association for Computing Machinery (ACM)",
number = "3",

}

TY - JOUR

T1 - Efficient and Practical Approach for Private Record Linkage

AU - Yakout, Mohamed

AU - Atallah, Mikhail J.

AU - Elmagarmid, Ahmed

PY - 2012/8/1

Y1 - 2012/8/1

N2 - Record linkage is used to associate entities from multiple data sources. For example, two organizations contemplating a merger may want to know how common their customer bases are so that they may better assess the benefits of the merger. Another example is a database of people who are forbidden from a certain activity by regulators, may need to be compared to a list of people engaged in that activity. The autonomous entities who wish to carry out the record matching computation are often reluctant to fully share their data; they fear losing control over its subsequent dissemination and usage, or they want to insure privacy because the data is proprietary or confidential, and/or they are cautious simply because privacy laws forbid its disclosure or regulate the form of that disclosure. In such cases, the problem of carrying out the linkage computation without full data exchange has been called private record linkage. Previous private record linkage techniques have made use of a third party. We provide efficient techniques for private record linkage that improve on previous work in that (1) our techniques make no use of a third party, and (2) they achieve much better performance than previous schemes in terms of their execution time while maintaining acceptable quality of output compared to nonprivacy settings. Our protocol consists of two phases. The first phase primarily produces candidate record pairs for matching, by carrying out a very fast (but not accurate) matching between such pairs of records. The second phase is a novel protocol for efficiently computing distances between each candidate pair (without any expensive cryptographic operations such as modular exponentiations). Our experimental evaluation of our approach validates these claims. Categories and Subject Descriptors: H.2.0 [Database Management]: General—Security, integrity, and protection; H.3.4 [Information Storage and Retrieval]: Systems and Software—Performance evaluation (efficiency and effectiveness); K.6.4 [Management of Computing and Information Systems]: System Management—Quality assurance.

AB - Record linkage is used to associate entities from multiple data sources. For example, two organizations contemplating a merger may want to know how common their customer bases are so that they may better assess the benefits of the merger. Another example is a database of people who are forbidden from a certain activity by regulators, may need to be compared to a list of people engaged in that activity. The autonomous entities who wish to carry out the record matching computation are often reluctant to fully share their data; they fear losing control over its subsequent dissemination and usage, or they want to insure privacy because the data is proprietary or confidential, and/or they are cautious simply because privacy laws forbid its disclosure or regulate the form of that disclosure. In such cases, the problem of carrying out the linkage computation without full data exchange has been called private record linkage. Previous private record linkage techniques have made use of a third party. We provide efficient techniques for private record linkage that improve on previous work in that (1) our techniques make no use of a third party, and (2) they achieve much better performance than previous schemes in terms of their execution time while maintaining acceptable quality of output compared to nonprivacy settings. Our protocol consists of two phases. The first phase primarily produces candidate record pairs for matching, by carrying out a very fast (but not accurate) matching between such pairs of records. The second phase is a novel protocol for efficiently computing distances between each candidate pair (without any expensive cryptographic operations such as modular exponentiations). Our experimental evaluation of our approach validates these claims. Categories and Subject Descriptors: H.2.0 [Database Management]: General—Security, integrity, and protection; H.3.4 [Information Storage and Retrieval]: Systems and Software—Performance evaluation (efficiency and effectiveness); K.6.4 [Management of Computing and Information Systems]: System Management—Quality assurance.

KW - Algorithms

KW - Design

KW - Experimentation

KW - integration

KW - linkage

KW - privacy

KW - private information retrieval

KW - private linkage

KW - Record linkage

KW - secure scalar product

UR - http://www.scopus.com/inward/record.url?scp=84962912015&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962912015&partnerID=8YFLogxK

U2 - 10.1145/2287714.2287715

DO - 10.1145/2287714.2287715

M3 - Article

AN - SCOPUS:84962912015

VL - 3

SP - 1

EP - 28

JO - Journal of Data and Information Quality

JF - Journal of Data and Information Quality

SN - 1936-1955

IS - 3

ER -