Bulk operations for space-partitioning trees

Thanaa M. Ghanem, Rahul Shah, Mohamed Mokbel, Walid G. Aref, Jeffrey S. Vitter

Research output: Contribution to conferencePaper

23 Citations (Scopus)

Abstract

The emergence of extensible index structures, e.g., GiST (Generalized Search Tree) and SP-GiST (Space-Partitioning Generalized Search Tree), calls for a set of extensible algorithms to support different operations (e.g., insertion, deletion, and search). Extensible bulk operations (e.g., bulk loading and bulk insertion) are of the same importance and need to be supported in these index engines. In this paper, we propose two extensible buffer-based algorithms for bulk operations in the class of space-partitioning trees; a class of hierarchical data structures that recursively decompose the space into disjoint partitions. The main idea of these algorithms is to build an in-memory tree of the target space-partitioning index. Then, data items are recursively partitioned into disk-based buffers using the in-memory tree. Although the second algorithm is designed for bulk insertion, it can be used in bulk loading as well. The proposed extensible algorithms are implemented inside SP-GiST; a framework for supporting the class of space-partitioning trees. Both algorithms have I/O bound O(NH/B), where N is the number of data items to be bulk loaded/inserted, B is the number of tree nodes that can fit in one disk page, H is the tree height in terms of pages after applying a clustering algorithm. Experimental results are provided to show the scalability and applicability of the proposed algorithms for the class of space-partitioning trees. A comparison of the two proposed algorithms shows that the first algorithm performs better in case of bulk loading. However the second algorithm is more general and can be used for efficient bulk insertion.

Original languageEnglish
Pages29-40
Number of pages12
DOIs
Publication statusPublished - 1 Jun 2004
Externally publishedYes
EventProceedings - 20th International Conference on Data Engineering - ICDE 2004 - Boston, MA., United States
Duration: 30 Mar 20042 Apr 2004

Other

OtherProceedings - 20th International Conference on Data Engineering - ICDE 2004
CountryUnited States
CityBoston, MA.
Period30/3/042/4/04

Fingerprint

Data storage equipment
Trees (mathematics)
Clustering algorithms
Data structures
Scalability
Engines

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Cite this

Ghanem, T. M., Shah, R., Mokbel, M., Aref, W. G., & Vitter, J. S. (2004). Bulk operations for space-partitioning trees. 29-40. Paper presented at Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States. https://doi.org/10.1109/ICDE.2004.1319982

Bulk operations for space-partitioning trees. / Ghanem, Thanaa M.; Shah, Rahul; Mokbel, Mohamed; Aref, Walid G.; Vitter, Jeffrey S.

2004. 29-40 Paper presented at Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States.

Research output: Contribution to conferencePaper

Ghanem, TM, Shah, R, Mokbel, M, Aref, WG & Vitter, JS 2004, 'Bulk operations for space-partitioning trees' Paper presented at Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States, 30/3/04 - 2/4/04, pp. 29-40. https://doi.org/10.1109/ICDE.2004.1319982
Ghanem TM, Shah R, Mokbel M, Aref WG, Vitter JS. Bulk operations for space-partitioning trees. 2004. Paper presented at Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States. https://doi.org/10.1109/ICDE.2004.1319982
Ghanem, Thanaa M. ; Shah, Rahul ; Mokbel, Mohamed ; Aref, Walid G. ; Vitter, Jeffrey S. / Bulk operations for space-partitioning trees. Paper presented at Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States.12 p.
@conference{ccdd72a8a1c2454aa4db26244829ba71,
title = "Bulk operations for space-partitioning trees",
abstract = "The emergence of extensible index structures, e.g., GiST (Generalized Search Tree) and SP-GiST (Space-Partitioning Generalized Search Tree), calls for a set of extensible algorithms to support different operations (e.g., insertion, deletion, and search). Extensible bulk operations (e.g., bulk loading and bulk insertion) are of the same importance and need to be supported in these index engines. In this paper, we propose two extensible buffer-based algorithms for bulk operations in the class of space-partitioning trees; a class of hierarchical data structures that recursively decompose the space into disjoint partitions. The main idea of these algorithms is to build an in-memory tree of the target space-partitioning index. Then, data items are recursively partitioned into disk-based buffers using the in-memory tree. Although the second algorithm is designed for bulk insertion, it can be used in bulk loading as well. The proposed extensible algorithms are implemented inside SP-GiST; a framework for supporting the class of space-partitioning trees. Both algorithms have I/O bound O(NH/B), where N is the number of data items to be bulk loaded/inserted, B is the number of tree nodes that can fit in one disk page, H is the tree height in terms of pages after applying a clustering algorithm. Experimental results are provided to show the scalability and applicability of the proposed algorithms for the class of space-partitioning trees. A comparison of the two proposed algorithms shows that the first algorithm performs better in case of bulk loading. However the second algorithm is more general and can be used for efficient bulk insertion.",
author = "Ghanem, {Thanaa M.} and Rahul Shah and Mohamed Mokbel and Aref, {Walid G.} and Vitter, {Jeffrey S.}",
year = "2004",
month = "6",
day = "1",
doi = "10.1109/ICDE.2004.1319982",
language = "English",
pages = "29--40",
note = "Proceedings - 20th International Conference on Data Engineering - ICDE 2004 ; Conference date: 30-03-2004 Through 02-04-2004",

}

TY - CONF

T1 - Bulk operations for space-partitioning trees

AU - Ghanem, Thanaa M.

AU - Shah, Rahul

AU - Mokbel, Mohamed

AU - Aref, Walid G.

AU - Vitter, Jeffrey S.

PY - 2004/6/1

Y1 - 2004/6/1

N2 - The emergence of extensible index structures, e.g., GiST (Generalized Search Tree) and SP-GiST (Space-Partitioning Generalized Search Tree), calls for a set of extensible algorithms to support different operations (e.g., insertion, deletion, and search). Extensible bulk operations (e.g., bulk loading and bulk insertion) are of the same importance and need to be supported in these index engines. In this paper, we propose two extensible buffer-based algorithms for bulk operations in the class of space-partitioning trees; a class of hierarchical data structures that recursively decompose the space into disjoint partitions. The main idea of these algorithms is to build an in-memory tree of the target space-partitioning index. Then, data items are recursively partitioned into disk-based buffers using the in-memory tree. Although the second algorithm is designed for bulk insertion, it can be used in bulk loading as well. The proposed extensible algorithms are implemented inside SP-GiST; a framework for supporting the class of space-partitioning trees. Both algorithms have I/O bound O(NH/B), where N is the number of data items to be bulk loaded/inserted, B is the number of tree nodes that can fit in one disk page, H is the tree height in terms of pages after applying a clustering algorithm. Experimental results are provided to show the scalability and applicability of the proposed algorithms for the class of space-partitioning trees. A comparison of the two proposed algorithms shows that the first algorithm performs better in case of bulk loading. However the second algorithm is more general and can be used for efficient bulk insertion.

AB - The emergence of extensible index structures, e.g., GiST (Generalized Search Tree) and SP-GiST (Space-Partitioning Generalized Search Tree), calls for a set of extensible algorithms to support different operations (e.g., insertion, deletion, and search). Extensible bulk operations (e.g., bulk loading and bulk insertion) are of the same importance and need to be supported in these index engines. In this paper, we propose two extensible buffer-based algorithms for bulk operations in the class of space-partitioning trees; a class of hierarchical data structures that recursively decompose the space into disjoint partitions. The main idea of these algorithms is to build an in-memory tree of the target space-partitioning index. Then, data items are recursively partitioned into disk-based buffers using the in-memory tree. Although the second algorithm is designed for bulk insertion, it can be used in bulk loading as well. The proposed extensible algorithms are implemented inside SP-GiST; a framework for supporting the class of space-partitioning trees. Both algorithms have I/O bound O(NH/B), where N is the number of data items to be bulk loaded/inserted, B is the number of tree nodes that can fit in one disk page, H is the tree height in terms of pages after applying a clustering algorithm. Experimental results are provided to show the scalability and applicability of the proposed algorithms for the class of space-partitioning trees. A comparison of the two proposed algorithms shows that the first algorithm performs better in case of bulk loading. However the second algorithm is more general and can be used for efficient bulk insertion.

UR - http://www.scopus.com/inward/record.url?scp=2442565562&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2442565562&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2004.1319982

DO - 10.1109/ICDE.2004.1319982

M3 - Paper

SP - 29

EP - 40

ER -