Twig2Stack: Bottom-up processing of Generalized-Tree-Pattern queries over XML documents

Songting Chen, Hua Gang Li, Junichi Tatemura, Wang Pin Hsiung, Divyakant Agrawal, K. Selçuk Candan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

135 Citations (Scopus)

Abstract

Tree pattern matching is one of the most fundamental tasks for XML query processing. Holistic twig query processing techniques [4, 16] have been developed to minimize the intermediate results, namely, those root-to-leaf path matches that are not in the final twig results. However, useless path matches cannot be completely avoided, especially when there is a parent-child relationship in the twig query. Furthermore, existing approaches do not consider the fact that in practice, in order to process XPath or XQuery statements, a more powerful form of twig queries, namely, Generalized-Tree-Pattern (GTP) [8] queries, is required. Most existing works on processing GTP queries generally calls for costly post-processing for eliminating redundant data and/or grouping of the matching results. In this paper, we first propose a novel hierarchical stack encoding scheme to compactly represent the twig results. We introduce Twig2stack, a bottom-up algorithm for processing twig queries based on this encoding scheme. Then we show how to efficiently enumerate the query results from the encodings for a given GTP query. To our knowledge, this is the first GTP matching solution that avoids any post path-join, sort, duplicate elimination and grouping operations. Extensive performance studies on various data sets and queries show that the proposed Twig2Stack algorithm not only has -better twig query processing performance than state-of-the-art algorithms, but is also capable of efficiently processing the more complex GTP queries.

Original languageEnglish
Title of host publicationVLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases
Pages283-294
Number of pages12
Publication statusPublished - 1 Dec 2006
Externally publishedYes
Event32nd International Conference on Very Large Data Bases, VLDB 2006 - Seoul, Korea, Republic of
Duration: 12 Sep 200615 Sep 2006

Other

Other32nd International Conference on Very Large Data Bases, VLDB 2006
CountryKorea, Republic of
CitySeoul
Period12/9/0615/9/06

Fingerprint

Query processing
XML
Pattern matching
Processing
Query
Bottom-up

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems
  • Software
  • Information Systems and Management

Cite this

Chen, S., Li, H. G., Tatemura, J., Hsiung, W. P., Agrawal, D., & Candan, K. S. (2006). Twig2Stack: Bottom-up processing of Generalized-Tree-Pattern queries over XML documents. In VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases (pp. 283-294)

Twig2Stack : Bottom-up processing of Generalized-Tree-Pattern queries over XML documents. / Chen, Songting; Li, Hua Gang; Tatemura, Junichi; Hsiung, Wang Pin; Agrawal, Divyakant; Candan, K. Selçuk.

VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases. 2006. p. 283-294.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chen, S, Li, HG, Tatemura, J, Hsiung, WP, Agrawal, D & Candan, KS 2006, Twig2Stack: Bottom-up processing of Generalized-Tree-Pattern queries over XML documents. in VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases. pp. 283-294, 32nd International Conference on Very Large Data Bases, VLDB 2006, Seoul, Korea, Republic of, 12/9/06.
Chen S, Li HG, Tatemura J, Hsiung WP, Agrawal D, Candan KS. Twig2Stack: Bottom-up processing of Generalized-Tree-Pattern queries over XML documents. In VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases. 2006. p. 283-294
Chen, Songting ; Li, Hua Gang ; Tatemura, Junichi ; Hsiung, Wang Pin ; Agrawal, Divyakant ; Candan, K. Selçuk. / Twig2Stack : Bottom-up processing of Generalized-Tree-Pattern queries over XML documents. VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases. 2006. pp. 283-294
@inproceedings{99805c4f11fb49e3ae7e34d9d00de362,
title = "Twig2Stack: Bottom-up processing of Generalized-Tree-Pattern queries over XML documents",
abstract = "Tree pattern matching is one of the most fundamental tasks for XML query processing. Holistic twig query processing techniques [4, 16] have been developed to minimize the intermediate results, namely, those root-to-leaf path matches that are not in the final twig results. However, useless path matches cannot be completely avoided, especially when there is a parent-child relationship in the twig query. Furthermore, existing approaches do not consider the fact that in practice, in order to process XPath or XQuery statements, a more powerful form of twig queries, namely, Generalized-Tree-Pattern (GTP) [8] queries, is required. Most existing works on processing GTP queries generally calls for costly post-processing for eliminating redundant data and/or grouping of the matching results. In this paper, we first propose a novel hierarchical stack encoding scheme to compactly represent the twig results. We introduce Twig2stack, a bottom-up algorithm for processing twig queries based on this encoding scheme. Then we show how to efficiently enumerate the query results from the encodings for a given GTP query. To our knowledge, this is the first GTP matching solution that avoids any post path-join, sort, duplicate elimination and grouping operations. Extensive performance studies on various data sets and queries show that the proposed Twig2Stack algorithm not only has -better twig query processing performance than state-of-the-art algorithms, but is also capable of efficiently processing the more complex GTP queries.",
author = "Songting Chen and Li, {Hua Gang} and Junichi Tatemura and Hsiung, {Wang Pin} and Divyakant Agrawal and Candan, {K. Sel{\cc}uk}",
year = "2006",
month = "12",
day = "1",
language = "English",
isbn = "1595933859",
pages = "283--294",
booktitle = "VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases",

}

TY - GEN

T1 - Twig2Stack

T2 - Bottom-up processing of Generalized-Tree-Pattern queries over XML documents

AU - Chen, Songting

AU - Li, Hua Gang

AU - Tatemura, Junichi

AU - Hsiung, Wang Pin

AU - Agrawal, Divyakant

AU - Candan, K. Selçuk

PY - 2006/12/1

Y1 - 2006/12/1

N2 - Tree pattern matching is one of the most fundamental tasks for XML query processing. Holistic twig query processing techniques [4, 16] have been developed to minimize the intermediate results, namely, those root-to-leaf path matches that are not in the final twig results. However, useless path matches cannot be completely avoided, especially when there is a parent-child relationship in the twig query. Furthermore, existing approaches do not consider the fact that in practice, in order to process XPath or XQuery statements, a more powerful form of twig queries, namely, Generalized-Tree-Pattern (GTP) [8] queries, is required. Most existing works on processing GTP queries generally calls for costly post-processing for eliminating redundant data and/or grouping of the matching results. In this paper, we first propose a novel hierarchical stack encoding scheme to compactly represent the twig results. We introduce Twig2stack, a bottom-up algorithm for processing twig queries based on this encoding scheme. Then we show how to efficiently enumerate the query results from the encodings for a given GTP query. To our knowledge, this is the first GTP matching solution that avoids any post path-join, sort, duplicate elimination and grouping operations. Extensive performance studies on various data sets and queries show that the proposed Twig2Stack algorithm not only has -better twig query processing performance than state-of-the-art algorithms, but is also capable of efficiently processing the more complex GTP queries.

AB - Tree pattern matching is one of the most fundamental tasks for XML query processing. Holistic twig query processing techniques [4, 16] have been developed to minimize the intermediate results, namely, those root-to-leaf path matches that are not in the final twig results. However, useless path matches cannot be completely avoided, especially when there is a parent-child relationship in the twig query. Furthermore, existing approaches do not consider the fact that in practice, in order to process XPath or XQuery statements, a more powerful form of twig queries, namely, Generalized-Tree-Pattern (GTP) [8] queries, is required. Most existing works on processing GTP queries generally calls for costly post-processing for eliminating redundant data and/or grouping of the matching results. In this paper, we first propose a novel hierarchical stack encoding scheme to compactly represent the twig results. We introduce Twig2stack, a bottom-up algorithm for processing twig queries based on this encoding scheme. Then we show how to efficiently enumerate the query results from the encodings for a given GTP query. To our knowledge, this is the first GTP matching solution that avoids any post path-join, sort, duplicate elimination and grouping operations. Extensive performance studies on various data sets and queries show that the proposed Twig2Stack algorithm not only has -better twig query processing performance than state-of-the-art algorithms, but is also capable of efficiently processing the more complex GTP queries.

UR - http://www.scopus.com/inward/record.url?scp=38049042201&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38049042201&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:38049042201

SN - 1595933859

SN - 9781595933850

SP - 283

EP - 294

BT - VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases

ER -