Behavior based record linkage

Mohamed Yakout, Ahmed Elmagarmid, Hazem Elmeleegy, Mourad Ouzzani, Alan Qi

Research output: Chapter in Book/Report/Conference proceedingChapter

28 Citations (Scopus)

Abstract

In this paper, we present a new record linkage approach that uses entity behavior to decide if potentially different entities are in fact the same. An entity's behavior is extracted from a transaction log that records the actions of this entity with respect to a given data source. The core of our approach is a technique that merges the behavior of two possible matched entities and computes the gain in recognizing behavior patterns as their matching score. The idea is that if we obtain a well recognized behavior after merge, then most likely, the original two behaviors belong to the same entity as the behavior becomes more complete after the merge. We present the necessary algorithms to model entities' behavior and compute a matching score for them. To improve the computational efficiency of our approach, we precede the actual matching phase with a fast candidate generation that uses a "quick and dirty" matching method. Extensive experiments on real data show that our approach can significantly enhance record linkage quality while being practical for large transaction logs.

Original languageEnglish
Title of host publicationProceedings of the VLDB Endowment
Pages439-448
Number of pages10
Volume3
Edition1
Publication statusPublished - Sep 2010

Fingerprint

Phase matching
Computational efficiency
Experiments

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Yakout, M., Elmagarmid, A., Elmeleegy, H., Ouzzani, M., & Qi, A. (2010). Behavior based record linkage. In Proceedings of the VLDB Endowment (1 ed., Vol. 3, pp. 439-448)

Behavior based record linkage. / Yakout, Mohamed; Elmagarmid, Ahmed; Elmeleegy, Hazem; Ouzzani, Mourad; Qi, Alan.

Proceedings of the VLDB Endowment. Vol. 3 1. ed. 2010. p. 439-448.

Research output: Chapter in Book/Report/Conference proceedingChapter

Yakout, M, Elmagarmid, A, Elmeleegy, H, Ouzzani, M & Qi, A 2010, Behavior based record linkage. in Proceedings of the VLDB Endowment. 1 edn, vol. 3, pp. 439-448.
Yakout M, Elmagarmid A, Elmeleegy H, Ouzzani M, Qi A. Behavior based record linkage. In Proceedings of the VLDB Endowment. 1 ed. Vol. 3. 2010. p. 439-448
Yakout, Mohamed ; Elmagarmid, Ahmed ; Elmeleegy, Hazem ; Ouzzani, Mourad ; Qi, Alan. / Behavior based record linkage. Proceedings of the VLDB Endowment. Vol. 3 1. ed. 2010. pp. 439-448
@inbook{c2c704d1fa444585967f982146ed18c4,
title = "Behavior based record linkage",
abstract = "In this paper, we present a new record linkage approach that uses entity behavior to decide if potentially different entities are in fact the same. An entity's behavior is extracted from a transaction log that records the actions of this entity with respect to a given data source. The core of our approach is a technique that merges the behavior of two possible matched entities and computes the gain in recognizing behavior patterns as their matching score. The idea is that if we obtain a well recognized behavior after merge, then most likely, the original two behaviors belong to the same entity as the behavior becomes more complete after the merge. We present the necessary algorithms to model entities' behavior and compute a matching score for them. To improve the computational efficiency of our approach, we precede the actual matching phase with a fast candidate generation that uses a {"}quick and dirty{"} matching method. Extensive experiments on real data show that our approach can significantly enhance record linkage quality while being practical for large transaction logs.",
author = "Mohamed Yakout and Ahmed Elmagarmid and Hazem Elmeleegy and Mourad Ouzzani and Alan Qi",
year = "2010",
month = "9",
language = "English",
volume = "3",
pages = "439--448",
booktitle = "Proceedings of the VLDB Endowment",
edition = "1",

}

TY - CHAP

T1 - Behavior based record linkage

AU - Yakout, Mohamed

AU - Elmagarmid, Ahmed

AU - Elmeleegy, Hazem

AU - Ouzzani, Mourad

AU - Qi, Alan

PY - 2010/9

Y1 - 2010/9

N2 - In this paper, we present a new record linkage approach that uses entity behavior to decide if potentially different entities are in fact the same. An entity's behavior is extracted from a transaction log that records the actions of this entity with respect to a given data source. The core of our approach is a technique that merges the behavior of two possible matched entities and computes the gain in recognizing behavior patterns as their matching score. The idea is that if we obtain a well recognized behavior after merge, then most likely, the original two behaviors belong to the same entity as the behavior becomes more complete after the merge. We present the necessary algorithms to model entities' behavior and compute a matching score for them. To improve the computational efficiency of our approach, we precede the actual matching phase with a fast candidate generation that uses a "quick and dirty" matching method. Extensive experiments on real data show that our approach can significantly enhance record linkage quality while being practical for large transaction logs.

AB - In this paper, we present a new record linkage approach that uses entity behavior to decide if potentially different entities are in fact the same. An entity's behavior is extracted from a transaction log that records the actions of this entity with respect to a given data source. The core of our approach is a technique that merges the behavior of two possible matched entities and computes the gain in recognizing behavior patterns as their matching score. The idea is that if we obtain a well recognized behavior after merge, then most likely, the original two behaviors belong to the same entity as the behavior becomes more complete after the merge. We present the necessary algorithms to model entities' behavior and compute a matching score for them. To improve the computational efficiency of our approach, we precede the actual matching phase with a fast candidate generation that uses a "quick and dirty" matching method. Extensive experiments on real data show that our approach can significantly enhance record linkage quality while being practical for large transaction logs.

UR - http://www.scopus.com/inward/record.url?scp=84856597650&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84856597650&partnerID=8YFLogxK

M3 - Chapter

AN - SCOPUS:84856597650

VL - 3

SP - 439

EP - 448

BT - Proceedings of the VLDB Endowment

ER -