Identifying value mappings for data integration: An unsupervised approach

Jaewoo Kang, Dongwon Lee, Prasenjit Mitra

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

The Web is a distributed network of information sources where the individual sources are autonomously created and maintained. Consequently, syntactic and semantic heterogeneity of data among sources abound. Most of the current data cleaning solutions assume that the data values referencing the same object bear some textual similarity. However, this assumption is often violated in practice. "Two-door front wheel drive" can be represented as "2DR-FWD" or "R2FD", or even as "CAR TYPE 3" in different data sources. To address this problem, we propose a novel two-step automated technique that exploits statistical dependency structures among objects which is invariant to the tokens representing the objects. The algorithm achieved a high accuracy in our empirical study, suggesting that it can be a useful addition to the existing information integration techniques.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages544-551
Number of pages8
Volume3806 LNCS
DOIs
Publication statusPublished - 2005
Externally publishedYes
Event6th International Conference on Web Information Systems Engineering, WISE 2005 - New York, NY
Duration: 20 Nov 200522 Nov 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3806 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other6th International Conference on Web Information Systems Engineering, WISE 2005
CityNew York, NY
Period20/11/0522/11/05

    Fingerprint

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Kang, J., Lee, D., & Mitra, P. (2005). Identifying value mappings for data integration: An unsupervised approach. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3806 LNCS, pp. 544-551). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3806 LNCS). https://doi.org/10.1007/11581062_46