Entity Identification in Database Integration

Ee Peng Lim, Jaideep Srivastava, Satya Prabhakar, James Richardson

Research output: Contribution to journalArticle

20 Citations (Scopus)

Abstract

The objective of entity identification is to determine the correspondence between objective instances from more than one database. This paper examines the problem at the instance level assuming that schema level heterogeneity has been resolved a priori. Soundness and completeness are defined as the desired properties of any entity-identification technique. To achieve soundness, a set of identity and distinctness rules have to be established for the entities in the integrated world. We then propose the use of extended key, which is the union of keys (and possibly other attributes) from the relations to be matched, and its corresponding identity rule to determine the equivalence between tuples from relations that may not share any common key. Instance level functional dependencies (ILFD), a form of semantic constraint information about the real-world entities, are used to derive the missing extended key attribute values of a tuple. Formal properties of ILFDs are derived. Results from a Prolog-based prototype entity-identification system are presented.

Original languageEnglish
Pages (from-to)1-38
Number of pages38
JournalInformation Sciences
Volume89
Issue number1-2
Publication statusPublished - Feb 1996
Externally publishedYes

Fingerprint

Identification (control systems)
Semantics
Soundness
Attribute
Functional Dependency
Prolog
System Identification
Schema
Completeness
Union
Correspondence
Equivalence
Prototype
Data base

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Information Systems
  • Information Systems and Management
  • Statistics, Probability and Uncertainty
  • Electrical and Electronic Engineering
  • Statistics and Probability

Cite this

Lim, E. P., Srivastava, J., Prabhakar, S., & Richardson, J. (1996). Entity Identification in Database Integration. Information Sciences, 89(1-2), 1-38.

Entity Identification in Database Integration. / Lim, Ee Peng; Srivastava, Jaideep; Prabhakar, Satya; Richardson, James.

In: Information Sciences, Vol. 89, No. 1-2, 02.1996, p. 1-38.

Research output: Contribution to journalArticle

Lim, EP, Srivastava, J, Prabhakar, S & Richardson, J 1996, 'Entity Identification in Database Integration', Information Sciences, vol. 89, no. 1-2, pp. 1-38.
Lim EP, Srivastava J, Prabhakar S, Richardson J. Entity Identification in Database Integration. Information Sciences. 1996 Feb;89(1-2):1-38.
Lim, Ee Peng ; Srivastava, Jaideep ; Prabhakar, Satya ; Richardson, James. / Entity Identification in Database Integration. In: Information Sciences. 1996 ; Vol. 89, No. 1-2. pp. 1-38.
@article{ab78b2fff7a3463fb585741b1cfde5c6,
title = "Entity Identification in Database Integration",
abstract = "The objective of entity identification is to determine the correspondence between objective instances from more than one database. This paper examines the problem at the instance level assuming that schema level heterogeneity has been resolved a priori. Soundness and completeness are defined as the desired properties of any entity-identification technique. To achieve soundness, a set of identity and distinctness rules have to be established for the entities in the integrated world. We then propose the use of extended key, which is the union of keys (and possibly other attributes) from the relations to be matched, and its corresponding identity rule to determine the equivalence between tuples from relations that may not share any common key. Instance level functional dependencies (ILFD), a form of semantic constraint information about the real-world entities, are used to derive the missing extended key attribute values of a tuple. Formal properties of ILFDs are derived. Results from a Prolog-based prototype entity-identification system are presented.",
author = "Lim, {Ee Peng} and Jaideep Srivastava and Satya Prabhakar and James Richardson",
year = "1996",
month = "2",
language = "English",
volume = "89",
pages = "1--38",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier Inc.",
number = "1-2",

}

TY - JOUR

T1 - Entity Identification in Database Integration

AU - Lim, Ee Peng

AU - Srivastava, Jaideep

AU - Prabhakar, Satya

AU - Richardson, James

PY - 1996/2

Y1 - 1996/2

N2 - The objective of entity identification is to determine the correspondence between objective instances from more than one database. This paper examines the problem at the instance level assuming that schema level heterogeneity has been resolved a priori. Soundness and completeness are defined as the desired properties of any entity-identification technique. To achieve soundness, a set of identity and distinctness rules have to be established for the entities in the integrated world. We then propose the use of extended key, which is the union of keys (and possibly other attributes) from the relations to be matched, and its corresponding identity rule to determine the equivalence between tuples from relations that may not share any common key. Instance level functional dependencies (ILFD), a form of semantic constraint information about the real-world entities, are used to derive the missing extended key attribute values of a tuple. Formal properties of ILFDs are derived. Results from a Prolog-based prototype entity-identification system are presented.

AB - The objective of entity identification is to determine the correspondence between objective instances from more than one database. This paper examines the problem at the instance level assuming that schema level heterogeneity has been resolved a priori. Soundness and completeness are defined as the desired properties of any entity-identification technique. To achieve soundness, a set of identity and distinctness rules have to be established for the entities in the integrated world. We then propose the use of extended key, which is the union of keys (and possibly other attributes) from the relations to be matched, and its corresponding identity rule to determine the equivalence between tuples from relations that may not share any common key. Instance level functional dependencies (ILFD), a form of semantic constraint information about the real-world entities, are used to derive the missing extended key attribute values of a tuple. Formal properties of ILFDs are derived. Results from a Prolog-based prototype entity-identification system are presented.

UR - http://www.scopus.com/inward/record.url?scp=0030083481&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0030083481&partnerID=8YFLogxK

M3 - Article

VL - 89

SP - 1

EP - 38

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

IS - 1-2

ER -