Data placement strategy for parallel XML databases

Guo Ren Wang, Nan Tang, Ya Xin Yu, Bing Sun, Ge Yu

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

This paper targets on parallel XML document partitioning strategies to process XML queries in parallel. To describe the problem of XML data partitioning, a concept, intermediary node, is presented in this paper. By a set of intermediary nodes, an XML data tree can be partitioned into a root-tree and a set of sub-trees. While the root-tree is duplicated over all the nodes, the set of the sub-trees can be evenly partitioned over all the nodes based on the workload of user queries. For the same XML data tree, there are a number of intermediary nodes sets, and different intermediary nodes sets will generate different partitions. It can be evaluated if a partitioning is good based on the workload of user queries. It is obviously an NP hard problem to choose an optimal partitioning. To solve this problem, this paper proposes a set of heuristic rules. Based on the idea described above, this paper designs and implements an XML data partitioning algorithm, WIN, and the extensive experimental results show that its speedup and scaleup performances outperform the existing strategies.

Original languageEnglish
Pages (from-to)770-781
Number of pages12
JournalRuan Jian Xue Bao/Journal of Software
Volume17
Issue number4
DOIs
Publication statusPublished - 1 Apr 2006
Externally publishedYes

Fingerprint

XML
Computational complexity

Keywords

  • Data partitioning
  • Intermediary node
  • Parallel database
  • Workload
  • XML document

ASJC Scopus subject areas

  • Software

Cite this

Data placement strategy for parallel XML databases. / Wang, Guo Ren; Tang, Nan; Yu, Ya Xin; Sun, Bing; Yu, Ge.

In: Ruan Jian Xue Bao/Journal of Software, Vol. 17, No. 4, 01.04.2006, p. 770-781.

Research output: Contribution to journalArticle

Wang, Guo Ren ; Tang, Nan ; Yu, Ya Xin ; Sun, Bing ; Yu, Ge. / Data placement strategy for parallel XML databases. In: Ruan Jian Xue Bao/Journal of Software. 2006 ; Vol. 17, No. 4. pp. 770-781.
@article{766e280c21524ff6bc6e4347d42eda20,
title = "Data placement strategy for parallel XML databases",
abstract = "This paper targets on parallel XML document partitioning strategies to process XML queries in parallel. To describe the problem of XML data partitioning, a concept, intermediary node, is presented in this paper. By a set of intermediary nodes, an XML data tree can be partitioned into a root-tree and a set of sub-trees. While the root-tree is duplicated over all the nodes, the set of the sub-trees can be evenly partitioned over all the nodes based on the workload of user queries. For the same XML data tree, there are a number of intermediary nodes sets, and different intermediary nodes sets will generate different partitions. It can be evaluated if a partitioning is good based on the workload of user queries. It is obviously an NP hard problem to choose an optimal partitioning. To solve this problem, this paper proposes a set of heuristic rules. Based on the idea described above, this paper designs and implements an XML data partitioning algorithm, WIN, and the extensive experimental results show that its speedup and scaleup performances outperform the existing strategies.",
keywords = "Data partitioning, Intermediary node, Parallel database, Workload, XML document",
author = "Wang, {Guo Ren} and Nan Tang and Yu, {Ya Xin} and Bing Sun and Ge Yu",
year = "2006",
month = "4",
day = "1",
doi = "10.1360/jos170770",
language = "English",
volume = "17",
pages = "770--781",
journal = "Ruan Jian Xue Bao/Journal of Software",
issn = "1000-9825",
publisher = "Chinese Academy of Sciences",
number = "4",

}

TY - JOUR

T1 - Data placement strategy for parallel XML databases

AU - Wang, Guo Ren

AU - Tang, Nan

AU - Yu, Ya Xin

AU - Sun, Bing

AU - Yu, Ge

PY - 2006/4/1

Y1 - 2006/4/1

N2 - This paper targets on parallel XML document partitioning strategies to process XML queries in parallel. To describe the problem of XML data partitioning, a concept, intermediary node, is presented in this paper. By a set of intermediary nodes, an XML data tree can be partitioned into a root-tree and a set of sub-trees. While the root-tree is duplicated over all the nodes, the set of the sub-trees can be evenly partitioned over all the nodes based on the workload of user queries. For the same XML data tree, there are a number of intermediary nodes sets, and different intermediary nodes sets will generate different partitions. It can be evaluated if a partitioning is good based on the workload of user queries. It is obviously an NP hard problem to choose an optimal partitioning. To solve this problem, this paper proposes a set of heuristic rules. Based on the idea described above, this paper designs and implements an XML data partitioning algorithm, WIN, and the extensive experimental results show that its speedup and scaleup performances outperform the existing strategies.

AB - This paper targets on parallel XML document partitioning strategies to process XML queries in parallel. To describe the problem of XML data partitioning, a concept, intermediary node, is presented in this paper. By a set of intermediary nodes, an XML data tree can be partitioned into a root-tree and a set of sub-trees. While the root-tree is duplicated over all the nodes, the set of the sub-trees can be evenly partitioned over all the nodes based on the workload of user queries. For the same XML data tree, there are a number of intermediary nodes sets, and different intermediary nodes sets will generate different partitions. It can be evaluated if a partitioning is good based on the workload of user queries. It is obviously an NP hard problem to choose an optimal partitioning. To solve this problem, this paper proposes a set of heuristic rules. Based on the idea described above, this paper designs and implements an XML data partitioning algorithm, WIN, and the extensive experimental results show that its speedup and scaleup performances outperform the existing strategies.

KW - Data partitioning

KW - Intermediary node

KW - Parallel database

KW - Workload

KW - XML document

UR - http://www.scopus.com/inward/record.url?scp=33745162236&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745162236&partnerID=8YFLogxK

U2 - 10.1360/jos170770

DO - 10.1360/jos170770

M3 - Article

VL - 17

SP - 770

EP - 781

JO - Ruan Jian Xue Bao/Journal of Software

JF - Ruan Jian Xue Bao/Journal of Software

SN - 1000-9825

IS - 4

ER -