Schema matching and embedded value mapping for databases with opaque column names and mixed continuous and discrete-valued data fields

Anuj Jaiswal, David J. Miller, Prasenjit Mitra

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Schema matching and value mapping across two information sources, such as databases, are critical information aggregation tasks. Before data can be integrated from multiple tables, the columns and values within the tables must be matched. The complexities of both these problems grow quickly with the number of attributes to be matched and due to multiple semantics of data values. Traditional research has mostly tackled schema matching and value mapping independently, and for categorical (discrete-valued) attributes. We propose novel methods that leverage value mappings to enhance schema matching in the presence of opaque column names for schemas consisting of both continuous and discrete-valued attributes. An additional source of complexity is that a discrete-valued attribute in one schema could in fact be a quantized, encoded version of a continuous-valued attribute in the other schema. In our approach, which can tackle both "onto" and bijective schema matching, the fitness objective for matching a pair of attributes from two schemas exploits the statistical distribution over values within the two attributes. Suitable fitness objectives are based on Euclidean-distance and the data log-likelihood, both of which are applied in our experimental study. A heuristic local descent optimization strategy that uses two-opt switching to optimize attribute matches, while simultaneously embedding value mappings, is applied for our matching methods. Our experiments show that the proposed techniques matched mixed continuous and discrete-valued attribute schemas with high accuracy and, thus, should be a useful addition to a framework of (semi) automated tools for data alignment.

Original languageEnglish
Article number2
JournalACM Transactions on Database Systems
Volume38
Issue number1
DOIs
Publication statusPublished - Apr 2013
Externally publishedYes

Fingerprint

Agglomeration
Semantics
Experiments

Keywords

  • Database alignment
  • Embedded value mapping
  • Schema matching

ASJC Scopus subject areas

  • Information Systems

Cite this

Schema matching and embedded value mapping for databases with opaque column names and mixed continuous and discrete-valued data fields. / Jaiswal, Anuj; Miller, David J.; Mitra, Prasenjit.

In: ACM Transactions on Database Systems, Vol. 38, No. 1, 2, 04.2013.

Research output: Contribution to journalArticle

@article{fc938264412d435689c84ddf5da049e4,
title = "Schema matching and embedded value mapping for databases with opaque column names and mixed continuous and discrete-valued data fields",
abstract = "Schema matching and value mapping across two information sources, such as databases, are critical information aggregation tasks. Before data can be integrated from multiple tables, the columns and values within the tables must be matched. The complexities of both these problems grow quickly with the number of attributes to be matched and due to multiple semantics of data values. Traditional research has mostly tackled schema matching and value mapping independently, and for categorical (discrete-valued) attributes. We propose novel methods that leverage value mappings to enhance schema matching in the presence of opaque column names for schemas consisting of both continuous and discrete-valued attributes. An additional source of complexity is that a discrete-valued attribute in one schema could in fact be a quantized, encoded version of a continuous-valued attribute in the other schema. In our approach, which can tackle both {"}onto{"} and bijective schema matching, the fitness objective for matching a pair of attributes from two schemas exploits the statistical distribution over values within the two attributes. Suitable fitness objectives are based on Euclidean-distance and the data log-likelihood, both of which are applied in our experimental study. A heuristic local descent optimization strategy that uses two-opt switching to optimize attribute matches, while simultaneously embedding value mappings, is applied for our matching methods. Our experiments show that the proposed techniques matched mixed continuous and discrete-valued attribute schemas with high accuracy and, thus, should be a useful addition to a framework of (semi) automated tools for data alignment.",
keywords = "Database alignment, Embedded value mapping, Schema matching",
author = "Anuj Jaiswal and Miller, {David J.} and Prasenjit Mitra",
year = "2013",
month = "4",
doi = "10.1145/2445583.2445585",
language = "English",
volume = "38",
journal = "ACM Transactions on Database Systems",
issn = "0362-5915",
publisher = "Association for Computing Machinery (ACM)",
number = "1",

}

TY - JOUR

T1 - Schema matching and embedded value mapping for databases with opaque column names and mixed continuous and discrete-valued data fields

AU - Jaiswal, Anuj

AU - Miller, David J.

AU - Mitra, Prasenjit

PY - 2013/4

Y1 - 2013/4

N2 - Schema matching and value mapping across two information sources, such as databases, are critical information aggregation tasks. Before data can be integrated from multiple tables, the columns and values within the tables must be matched. The complexities of both these problems grow quickly with the number of attributes to be matched and due to multiple semantics of data values. Traditional research has mostly tackled schema matching and value mapping independently, and for categorical (discrete-valued) attributes. We propose novel methods that leverage value mappings to enhance schema matching in the presence of opaque column names for schemas consisting of both continuous and discrete-valued attributes. An additional source of complexity is that a discrete-valued attribute in one schema could in fact be a quantized, encoded version of a continuous-valued attribute in the other schema. In our approach, which can tackle both "onto" and bijective schema matching, the fitness objective for matching a pair of attributes from two schemas exploits the statistical distribution over values within the two attributes. Suitable fitness objectives are based on Euclidean-distance and the data log-likelihood, both of which are applied in our experimental study. A heuristic local descent optimization strategy that uses two-opt switching to optimize attribute matches, while simultaneously embedding value mappings, is applied for our matching methods. Our experiments show that the proposed techniques matched mixed continuous and discrete-valued attribute schemas with high accuracy and, thus, should be a useful addition to a framework of (semi) automated tools for data alignment.

AB - Schema matching and value mapping across two information sources, such as databases, are critical information aggregation tasks. Before data can be integrated from multiple tables, the columns and values within the tables must be matched. The complexities of both these problems grow quickly with the number of attributes to be matched and due to multiple semantics of data values. Traditional research has mostly tackled schema matching and value mapping independently, and for categorical (discrete-valued) attributes. We propose novel methods that leverage value mappings to enhance schema matching in the presence of opaque column names for schemas consisting of both continuous and discrete-valued attributes. An additional source of complexity is that a discrete-valued attribute in one schema could in fact be a quantized, encoded version of a continuous-valued attribute in the other schema. In our approach, which can tackle both "onto" and bijective schema matching, the fitness objective for matching a pair of attributes from two schemas exploits the statistical distribution over values within the two attributes. Suitable fitness objectives are based on Euclidean-distance and the data log-likelihood, both of which are applied in our experimental study. A heuristic local descent optimization strategy that uses two-opt switching to optimize attribute matches, while simultaneously embedding value mappings, is applied for our matching methods. Our experiments show that the proposed techniques matched mixed continuous and discrete-valued attribute schemas with high accuracy and, thus, should be a useful addition to a framework of (semi) automated tools for data alignment.

KW - Database alignment

KW - Embedded value mapping

KW - Schema matching

UR - http://www.scopus.com/inward/record.url?scp=84878330002&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84878330002&partnerID=8YFLogxK

U2 - 10.1145/2445583.2445585

DO - 10.1145/2445583.2445585

M3 - Article

VL - 38

JO - ACM Transactions on Database Systems

JF - ACM Transactions on Database Systems

SN - 0362-5915

IS - 1

M1 - 2

ER -