Index-based approximate XML joins

Sudipto Guha, Nick Koudas, Divesh Srivastava, Ting Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

XML data integration tools are facing a variety of challenges for their efficient and effective operation. Among these is the requirement to handle a variety of inconsistencies or mistakes present in the data sets. In this paper we study the problem of integrating XML data sources through index assisted join operations, using notions of approximate match in the structure and content of XML documents as the join predicate. We show how a well known and widely deployed index structure, namely the R-tree, can be adopted to improve the performance of such operations. We propose novel search and join algorithms for R-trees adopted to index XML document collections. We also propose novel optimization objectives for R-tree construction, making R-trees better suited for this application.

Original languageEnglish
Title of host publicationProceedings - International Conference on Data Engineering
EditorsU. Dayal, K. Ramamritham, T.M. Vijayaraman
Pages708-710
Number of pages3
DOIs
Publication statusPublished - 1 Dec 2003
Externally publishedYes
EventNineteenth International Conference on Data Ingineering - Bangalore, India
Duration: 5 Mar 20038 Mar 2003

Other

OtherNineteenth International Conference on Data Ingineering
CountryIndia
CityBangalore
Period5/3/038/3/03

Fingerprint

XML
Data integration

ASJC Scopus subject areas

  • Software
  • Engineering(all)
  • Engineering (miscellaneous)

Cite this

Guha, S., Koudas, N., Srivastava, D., & Yu, T. (2003). Index-based approximate XML joins. In U. Dayal, K. Ramamritham, & T. M. Vijayaraman (Eds.), Proceedings - International Conference on Data Engineering (pp. 708-710) https://doi.org/10.1109/ICDE.2003.1260843

Index-based approximate XML joins. / Guha, Sudipto; Koudas, Nick; Srivastava, Divesh; Yu, Ting.

Proceedings - International Conference on Data Engineering. ed. / U. Dayal; K. Ramamritham; T.M. Vijayaraman. 2003. p. 708-710.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Guha, S, Koudas, N, Srivastava, D & Yu, T 2003, Index-based approximate XML joins. in U Dayal, K Ramamritham & TM Vijayaraman (eds), Proceedings - International Conference on Data Engineering. pp. 708-710, Nineteenth International Conference on Data Ingineering, Bangalore, India, 5/3/03. https://doi.org/10.1109/ICDE.2003.1260843
Guha S, Koudas N, Srivastava D, Yu T. Index-based approximate XML joins. In Dayal U, Ramamritham K, Vijayaraman TM, editors, Proceedings - International Conference on Data Engineering. 2003. p. 708-710 https://doi.org/10.1109/ICDE.2003.1260843
Guha, Sudipto ; Koudas, Nick ; Srivastava, Divesh ; Yu, Ting. / Index-based approximate XML joins. Proceedings - International Conference on Data Engineering. editor / U. Dayal ; K. Ramamritham ; T.M. Vijayaraman. 2003. pp. 708-710
@inproceedings{8205e1d553e0442ebbec8e40d2b10b8c,
title = "Index-based approximate XML joins",
abstract = "XML data integration tools are facing a variety of challenges for their efficient and effective operation. Among these is the requirement to handle a variety of inconsistencies or mistakes present in the data sets. In this paper we study the problem of integrating XML data sources through index assisted join operations, using notions of approximate match in the structure and content of XML documents as the join predicate. We show how a well known and widely deployed index structure, namely the R-tree, can be adopted to improve the performance of such operations. We propose novel search and join algorithms for R-trees adopted to index XML document collections. We also propose novel optimization objectives for R-tree construction, making R-trees better suited for this application.",
author = "Sudipto Guha and Nick Koudas and Divesh Srivastava and Ting Yu",
year = "2003",
month = "12",
day = "1",
doi = "10.1109/ICDE.2003.1260843",
language = "English",
pages = "708--710",
editor = "U. Dayal and K. Ramamritham and T.M. Vijayaraman",
booktitle = "Proceedings - International Conference on Data Engineering",

}

TY - GEN

T1 - Index-based approximate XML joins

AU - Guha, Sudipto

AU - Koudas, Nick

AU - Srivastava, Divesh

AU - Yu, Ting

PY - 2003/12/1

Y1 - 2003/12/1

N2 - XML data integration tools are facing a variety of challenges for their efficient and effective operation. Among these is the requirement to handle a variety of inconsistencies or mistakes present in the data sets. In this paper we study the problem of integrating XML data sources through index assisted join operations, using notions of approximate match in the structure and content of XML documents as the join predicate. We show how a well known and widely deployed index structure, namely the R-tree, can be adopted to improve the performance of such operations. We propose novel search and join algorithms for R-trees adopted to index XML document collections. We also propose novel optimization objectives for R-tree construction, making R-trees better suited for this application.

AB - XML data integration tools are facing a variety of challenges for their efficient and effective operation. Among these is the requirement to handle a variety of inconsistencies or mistakes present in the data sets. In this paper we study the problem of integrating XML data sources through index assisted join operations, using notions of approximate match in the structure and content of XML documents as the join predicate. We show how a well known and widely deployed index structure, namely the R-tree, can be adopted to improve the performance of such operations. We propose novel search and join algorithms for R-trees adopted to index XML document collections. We also propose novel optimization objectives for R-tree construction, making R-trees better suited for this application.

UR - http://www.scopus.com/inward/record.url?scp=0344496626&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0344496626&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2003.1260843

DO - 10.1109/ICDE.2003.1260843

M3 - Conference contribution

SP - 708

EP - 710

BT - Proceedings - International Conference on Data Engineering

A2 - Dayal, U.

A2 - Ramamritham, K.

A2 - Vijayaraman, T.M.

ER -