Scalable filtering of multiple generalized-tree-pattern queries over XML streams

Songting Chen, Hua Gang Li, JuN'Ichi Tatemura, Wang Pin Hsiung, Divyakant Agrawal, K. Selçuk Candan

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

An XML publish/subscribe system needs to filter a large number of queries over XML streams. Most existing systems only consider filtering the simple XPath statements. In this paper, we focus on filtering of the more complex Generalized-Tree-Pattern (GTP) queries. Our filtering mechanism is based on a novel Tree-of-Path (TOP) encoding scheme, which compactly represents the path matches for the entire document. First, we show that the TOP encodings can be efficiently produced via shared bottom-up path matching. Second, with the aid of this TOP encoding, we can 1) achieve polynomial time and space complexity for postprocessing, 2) avoid redundant predicate evaluations, 3) allow an efficient duplicate-free and merge join-based algorithm for merging multiple encoded path matches, and 4) simplify the processing of GTP queries. Overall, our approach maximizes the sharing opportunity across queries by exploiting the suffix as well as prefix sharing. At the same time, our TOP encodings allow efficient postprocessing for GTP queries. Extensive performance studies show that GFilter not only achieves significantly better filtering performance than state-of-the-art algorithms but also is capable of efficiently filtering the more complex GTP queries.

Original languageEnglish
Article number4509431
Pages (from-to)1627-1640
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume20
Issue number12
DOIs
Publication statusPublished - 1 Dec 2008
Externally publishedYes

Fingerprint

XML
Merging
Polynomials
Processing

Keywords

  • Generalized-tree-pattern queries
  • Result encoding
  • XML filtering
  • XML streams

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Information Systems
  • Computer Science Applications

Cite this

Chen, S., Li, H. G., Tatemura, JI., Hsiung, W. P., Agrawal, D., & Candan, K. S. (2008). Scalable filtering of multiple generalized-tree-pattern queries over XML streams. IEEE Transactions on Knowledge and Data Engineering, 20(12), 1627-1640. [4509431]. https://doi.org/10.1109/TKDE.2008.83

Scalable filtering of multiple generalized-tree-pattern queries over XML streams. / Chen, Songting; Li, Hua Gang; Tatemura, JuN'Ichi; Hsiung, Wang Pin; Agrawal, Divyakant; Candan, K. Selçuk.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 20, No. 12, 4509431, 01.12.2008, p. 1627-1640.

Research output: Contribution to journalArticle

Chen, S, Li, HG, Tatemura, JI, Hsiung, WP, Agrawal, D & Candan, KS 2008, 'Scalable filtering of multiple generalized-tree-pattern queries over XML streams', IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 12, 4509431, pp. 1627-1640. https://doi.org/10.1109/TKDE.2008.83
Chen, Songting ; Li, Hua Gang ; Tatemura, JuN'Ichi ; Hsiung, Wang Pin ; Agrawal, Divyakant ; Candan, K. Selçuk. / Scalable filtering of multiple generalized-tree-pattern queries over XML streams. In: IEEE Transactions on Knowledge and Data Engineering. 2008 ; Vol. 20, No. 12. pp. 1627-1640.
@article{f8e1bab6b6a3494d979dcd4d946542ab,
title = "Scalable filtering of multiple generalized-tree-pattern queries over XML streams",
abstract = "An XML publish/subscribe system needs to filter a large number of queries over XML streams. Most existing systems only consider filtering the simple XPath statements. In this paper, we focus on filtering of the more complex Generalized-Tree-Pattern (GTP) queries. Our filtering mechanism is based on a novel Tree-of-Path (TOP) encoding scheme, which compactly represents the path matches for the entire document. First, we show that the TOP encodings can be efficiently produced via shared bottom-up path matching. Second, with the aid of this TOP encoding, we can 1) achieve polynomial time and space complexity for postprocessing, 2) avoid redundant predicate evaluations, 3) allow an efficient duplicate-free and merge join-based algorithm for merging multiple encoded path matches, and 4) simplify the processing of GTP queries. Overall, our approach maximizes the sharing opportunity across queries by exploiting the suffix as well as prefix sharing. At the same time, our TOP encodings allow efficient postprocessing for GTP queries. Extensive performance studies show that GFilter not only achieves significantly better filtering performance than state-of-the-art algorithms but also is capable of efficiently filtering the more complex GTP queries.",
keywords = "Generalized-tree-pattern queries, Result encoding, XML filtering, XML streams",
author = "Songting Chen and Li, {Hua Gang} and JuN'Ichi Tatemura and Hsiung, {Wang Pin} and Divyakant Agrawal and Candan, {K. Sel{\cc}uk}",
year = "2008",
month = "12",
day = "1",
doi = "10.1109/TKDE.2008.83",
language = "English",
volume = "20",
pages = "1627--1640",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "12",

}

TY - JOUR

T1 - Scalable filtering of multiple generalized-tree-pattern queries over XML streams

AU - Chen, Songting

AU - Li, Hua Gang

AU - Tatemura, JuN'Ichi

AU - Hsiung, Wang Pin

AU - Agrawal, Divyakant

AU - Candan, K. Selçuk

PY - 2008/12/1

Y1 - 2008/12/1

N2 - An XML publish/subscribe system needs to filter a large number of queries over XML streams. Most existing systems only consider filtering the simple XPath statements. In this paper, we focus on filtering of the more complex Generalized-Tree-Pattern (GTP) queries. Our filtering mechanism is based on a novel Tree-of-Path (TOP) encoding scheme, which compactly represents the path matches for the entire document. First, we show that the TOP encodings can be efficiently produced via shared bottom-up path matching. Second, with the aid of this TOP encoding, we can 1) achieve polynomial time and space complexity for postprocessing, 2) avoid redundant predicate evaluations, 3) allow an efficient duplicate-free and merge join-based algorithm for merging multiple encoded path matches, and 4) simplify the processing of GTP queries. Overall, our approach maximizes the sharing opportunity across queries by exploiting the suffix as well as prefix sharing. At the same time, our TOP encodings allow efficient postprocessing for GTP queries. Extensive performance studies show that GFilter not only achieves significantly better filtering performance than state-of-the-art algorithms but also is capable of efficiently filtering the more complex GTP queries.

AB - An XML publish/subscribe system needs to filter a large number of queries over XML streams. Most existing systems only consider filtering the simple XPath statements. In this paper, we focus on filtering of the more complex Generalized-Tree-Pattern (GTP) queries. Our filtering mechanism is based on a novel Tree-of-Path (TOP) encoding scheme, which compactly represents the path matches for the entire document. First, we show that the TOP encodings can be efficiently produced via shared bottom-up path matching. Second, with the aid of this TOP encoding, we can 1) achieve polynomial time and space complexity for postprocessing, 2) avoid redundant predicate evaluations, 3) allow an efficient duplicate-free and merge join-based algorithm for merging multiple encoded path matches, and 4) simplify the processing of GTP queries. Overall, our approach maximizes the sharing opportunity across queries by exploiting the suffix as well as prefix sharing. At the same time, our TOP encodings allow efficient postprocessing for GTP queries. Extensive performance studies show that GFilter not only achieves significantly better filtering performance than state-of-the-art algorithms but also is capable of efficiently filtering the more complex GTP queries.

KW - Generalized-tree-pattern queries

KW - Result encoding

KW - XML filtering

KW - XML streams

UR - http://www.scopus.com/inward/record.url?scp=55949123584&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=55949123584&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2008.83

DO - 10.1109/TKDE.2008.83

M3 - Article

VL - 20

SP - 1627

EP - 1640

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 12

M1 - 4509431

ER -