Uninterpreted schema matching with embedded value mapping under opaque column names and data values

Anuj Jaiswal, David J. Miller, Prasenjit Mitra

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Schema matching and value mapping across two heterogenous information sources are critical tasks in applications involving data integration, data warehousing, and federation of databases. Before data can be integrated from multiple tables, the columns and the values appearing in the tables must be matched. The complexity of the problem grows quickly with the number of data attributes/columns to be matched and due to multiple semantics of data values. Traditional research has tackled schema matching and value mapping independently. We propose a novel method that optimizes embedded value mappings to enhance schema matching in the presence of opaque data values and column names. In this approach, the fitness objective for matching a pair of attributes from two schemas depends on the value mapping function for each of the two attributes. Suitable fitness objectives include the euclidean distance measure, which we use in our experimental study, as well as relative (cross) entropy. We propose a heuristic local descent optimization strategy that uses sorting and two-opt switching to jointly optimize value mappings and attribute matches. Our experiments show that our proposed technique outperforms earlier uninterpreted schema matching methods, and thus, should form a useful addition to a suite of (semi) automated tools for resolving structural heterogeneity.

Original languageEnglish
Article number4799783
Pages (from-to)291-304
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume22
Issue number2
DOIs
Publication statusPublished - Feb 2010
Externally publishedYes

Fingerprint

Data warehouses
Data integration
Sorting
Entropy
Semantics
Experiments

Keywords

  • Embedded schema matching with value mapping
  • Opaque conditions
  • Schema matching

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Information Systems
  • Computer Science Applications

Cite this

Uninterpreted schema matching with embedded value mapping under opaque column names and data values. / Jaiswal, Anuj; Miller, David J.; Mitra, Prasenjit.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 2, 4799783, 02.2010, p. 291-304.

Research output: Contribution to journalArticle

@article{e05c69af9bec410db963c6ec6ee5df4f,
title = "Uninterpreted schema matching with embedded value mapping under opaque column names and data values",
abstract = "Schema matching and value mapping across two heterogenous information sources are critical tasks in applications involving data integration, data warehousing, and federation of databases. Before data can be integrated from multiple tables, the columns and the values appearing in the tables must be matched. The complexity of the problem grows quickly with the number of data attributes/columns to be matched and due to multiple semantics of data values. Traditional research has tackled schema matching and value mapping independently. We propose a novel method that optimizes embedded value mappings to enhance schema matching in the presence of opaque data values and column names. In this approach, the fitness objective for matching a pair of attributes from two schemas depends on the value mapping function for each of the two attributes. Suitable fitness objectives include the euclidean distance measure, which we use in our experimental study, as well as relative (cross) entropy. We propose a heuristic local descent optimization strategy that uses sorting and two-opt switching to jointly optimize value mappings and attribute matches. Our experiments show that our proposed technique outperforms earlier uninterpreted schema matching methods, and thus, should form a useful addition to a suite of (semi) automated tools for resolving structural heterogeneity.",
keywords = "Embedded schema matching with value mapping, Opaque conditions, Schema matching",
author = "Anuj Jaiswal and Miller, {David J.} and Prasenjit Mitra",
year = "2010",
month = "2",
doi = "10.1109/TKDE.2009.69",
language = "English",
volume = "22",
pages = "291--304",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "2",

}

TY - JOUR

T1 - Uninterpreted schema matching with embedded value mapping under opaque column names and data values

AU - Jaiswal, Anuj

AU - Miller, David J.

AU - Mitra, Prasenjit

PY - 2010/2

Y1 - 2010/2

N2 - Schema matching and value mapping across two heterogenous information sources are critical tasks in applications involving data integration, data warehousing, and federation of databases. Before data can be integrated from multiple tables, the columns and the values appearing in the tables must be matched. The complexity of the problem grows quickly with the number of data attributes/columns to be matched and due to multiple semantics of data values. Traditional research has tackled schema matching and value mapping independently. We propose a novel method that optimizes embedded value mappings to enhance schema matching in the presence of opaque data values and column names. In this approach, the fitness objective for matching a pair of attributes from two schemas depends on the value mapping function for each of the two attributes. Suitable fitness objectives include the euclidean distance measure, which we use in our experimental study, as well as relative (cross) entropy. We propose a heuristic local descent optimization strategy that uses sorting and two-opt switching to jointly optimize value mappings and attribute matches. Our experiments show that our proposed technique outperforms earlier uninterpreted schema matching methods, and thus, should form a useful addition to a suite of (semi) automated tools for resolving structural heterogeneity.

AB - Schema matching and value mapping across two heterogenous information sources are critical tasks in applications involving data integration, data warehousing, and federation of databases. Before data can be integrated from multiple tables, the columns and the values appearing in the tables must be matched. The complexity of the problem grows quickly with the number of data attributes/columns to be matched and due to multiple semantics of data values. Traditional research has tackled schema matching and value mapping independently. We propose a novel method that optimizes embedded value mappings to enhance schema matching in the presence of opaque data values and column names. In this approach, the fitness objective for matching a pair of attributes from two schemas depends on the value mapping function for each of the two attributes. Suitable fitness objectives include the euclidean distance measure, which we use in our experimental study, as well as relative (cross) entropy. We propose a heuristic local descent optimization strategy that uses sorting and two-opt switching to jointly optimize value mappings and attribute matches. Our experiments show that our proposed technique outperforms earlier uninterpreted schema matching methods, and thus, should form a useful addition to a suite of (semi) automated tools for resolving structural heterogeneity.

KW - Embedded schema matching with value mapping

KW - Opaque conditions

KW - Schema matching

UR - http://www.scopus.com/inward/record.url?scp=75449088101&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=75449088101&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2009.69

DO - 10.1109/TKDE.2009.69

M3 - Article

VL - 22

SP - 291

EP - 304

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 2

M1 - 4799783

ER -