Duplicate elimination in space-partitioning tree indexes

M. Y. Eltabakh, Mourad Ouzzani, Walid G. Aref

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Space-partitioning trees, like the disk-based trie, quadtree, kd-tree and their variants, are a family of access methods that index multi-dimensional objects. In the case of indexing non-zero extent objects, e.g., line segments and rectangles, space-partitioning trees may replicate objects over multiple space partitions, e.g., PMR quadtree, expanded MX-CIF quadtree, and extended kd-tree. As a result, the answer to a query over these indexes may include duplicates that need to be eliminated, i.e., the same object may be reported more than once. In this paper, we propose generic duplicate elimination techniques for the class of space-partitioning trees in the context of SP-GiST; an extensible indexing framework for realizing space-partitioning trees. The proposed techniques are embedded inside the INDEX-SCAN operator. Therefore, duplicate copies of the same object do not propagate in the query plan, and the elimination process is transparent to the end-users. Two cases for the index structures are considered based on whether or not the objects' coordinates are stored inside the index tree. The theoretical and experimental analysis illustrate that the proposed techniques achieve savings in the storage requirements, I/O operations, and processing time when compared to adding a separate duplicate elimination operator in the query plan.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Scientific and Statistical Database Management, SSDBM
DOIs
Publication statusPublished - 1 Dec 2007
Externally publishedYes
Event19th International Conference on Scientific and Statistical Database Management, SSDBM 2007 - Banff, AB, Canada
Duration: 9 Jul 200711 Jul 2007

Other

Other19th International Conference on Scientific and Statistical Database Management, SSDBM 2007
CountryCanada
CityBanff, AB
Period9/7/0711/7/07

Fingerprint

Elimination
Partitioning
Quadtree
Processing
Kd-tree
Query
Indexing
Experimental Analysis
Line segment
Operator
Rectangle
Object
Theoretical Analysis
Partition
Requirements

ASJC Scopus subject areas

  • Software
  • Applied Mathematics

Cite this

Eltabakh, M. Y., Ouzzani, M., & Aref, W. G. (2007). Duplicate elimination in space-partitioning tree indexes. In Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM [4274963] https://doi.org/10.1109/SSDBM.2007.10

Duplicate elimination in space-partitioning tree indexes. / Eltabakh, M. Y.; Ouzzani, Mourad; Aref, Walid G.

Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM. 2007. 4274963.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Eltabakh, MY, Ouzzani, M & Aref, WG 2007, Duplicate elimination in space-partitioning tree indexes. in Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM., 4274963, 19th International Conference on Scientific and Statistical Database Management, SSDBM 2007, Banff, AB, Canada, 9/7/07. https://doi.org/10.1109/SSDBM.2007.10
Eltabakh MY, Ouzzani M, Aref WG. Duplicate elimination in space-partitioning tree indexes. In Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM. 2007. 4274963 https://doi.org/10.1109/SSDBM.2007.10
Eltabakh, M. Y. ; Ouzzani, Mourad ; Aref, Walid G. / Duplicate elimination in space-partitioning tree indexes. Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM. 2007.
@inproceedings{52562004b0ab4aabac651d996b621108,
title = "Duplicate elimination in space-partitioning tree indexes",
abstract = "Space-partitioning trees, like the disk-based trie, quadtree, kd-tree and their variants, are a family of access methods that index multi-dimensional objects. In the case of indexing non-zero extent objects, e.g., line segments and rectangles, space-partitioning trees may replicate objects over multiple space partitions, e.g., PMR quadtree, expanded MX-CIF quadtree, and extended kd-tree. As a result, the answer to a query over these indexes may include duplicates that need to be eliminated, i.e., the same object may be reported more than once. In this paper, we propose generic duplicate elimination techniques for the class of space-partitioning trees in the context of SP-GiST; an extensible indexing framework for realizing space-partitioning trees. The proposed techniques are embedded inside the INDEX-SCAN operator. Therefore, duplicate copies of the same object do not propagate in the query plan, and the elimination process is transparent to the end-users. Two cases for the index structures are considered based on whether or not the objects' coordinates are stored inside the index tree. The theoretical and experimental analysis illustrate that the proposed techniques achieve savings in the storage requirements, I/O operations, and processing time when compared to adding a separate duplicate elimination operator in the query plan.",
author = "Eltabakh, {M. Y.} and Mourad Ouzzani and Aref, {Walid G.}",
year = "2007",
month = "12",
day = "1",
doi = "10.1109/SSDBM.2007.10",
language = "English",
isbn = "0769528686",
booktitle = "Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM",

}

TY - GEN

T1 - Duplicate elimination in space-partitioning tree indexes

AU - Eltabakh, M. Y.

AU - Ouzzani, Mourad

AU - Aref, Walid G.

PY - 2007/12/1

Y1 - 2007/12/1

N2 - Space-partitioning trees, like the disk-based trie, quadtree, kd-tree and their variants, are a family of access methods that index multi-dimensional objects. In the case of indexing non-zero extent objects, e.g., line segments and rectangles, space-partitioning trees may replicate objects over multiple space partitions, e.g., PMR quadtree, expanded MX-CIF quadtree, and extended kd-tree. As a result, the answer to a query over these indexes may include duplicates that need to be eliminated, i.e., the same object may be reported more than once. In this paper, we propose generic duplicate elimination techniques for the class of space-partitioning trees in the context of SP-GiST; an extensible indexing framework for realizing space-partitioning trees. The proposed techniques are embedded inside the INDEX-SCAN operator. Therefore, duplicate copies of the same object do not propagate in the query plan, and the elimination process is transparent to the end-users. Two cases for the index structures are considered based on whether or not the objects' coordinates are stored inside the index tree. The theoretical and experimental analysis illustrate that the proposed techniques achieve savings in the storage requirements, I/O operations, and processing time when compared to adding a separate duplicate elimination operator in the query plan.

AB - Space-partitioning trees, like the disk-based trie, quadtree, kd-tree and their variants, are a family of access methods that index multi-dimensional objects. In the case of indexing non-zero extent objects, e.g., line segments and rectangles, space-partitioning trees may replicate objects over multiple space partitions, e.g., PMR quadtree, expanded MX-CIF quadtree, and extended kd-tree. As a result, the answer to a query over these indexes may include duplicates that need to be eliminated, i.e., the same object may be reported more than once. In this paper, we propose generic duplicate elimination techniques for the class of space-partitioning trees in the context of SP-GiST; an extensible indexing framework for realizing space-partitioning trees. The proposed techniques are embedded inside the INDEX-SCAN operator. Therefore, duplicate copies of the same object do not propagate in the query plan, and the elimination process is transparent to the end-users. Two cases for the index structures are considered based on whether or not the objects' coordinates are stored inside the index tree. The theoretical and experimental analysis illustrate that the proposed techniques achieve savings in the storage requirements, I/O operations, and processing time when compared to adding a separate duplicate elimination operator in the query plan.

UR - http://www.scopus.com/inward/record.url?scp=46649121240&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=46649121240&partnerID=8YFLogxK

U2 - 10.1109/SSDBM.2007.10

DO - 10.1109/SSDBM.2007.10

M3 - Conference contribution

SN - 0769528686

SN - 9780769528687

BT - Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM

ER -