Clustering declustered data for efficient retrieval

Hakan Ferhatosmanoglu, Divyakant Agrawal, Amr El Abbadi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Modern databases increasingly integrate new kinds of information, such as multimedia information in the form of image, video, and audio data. Both the dimensionality and the amount of data that need to be processed is increasing rapidly, increasing the demand for the efficient retrieval of large amounts of multi-dimensional data. Declustering techniques for multi-disk architectures have been effectively used for storage. In this paper, we first establish that besides exploiting the parallelism, a careful organization of each disk must be considered for fast searching. We introduce the notion of page allocation and data space mapping which can be used to organize and retrieve multidimensional data. We develop these notions based on three different partitioning strategies: regular grid partitioning, concentric hypercubes and hyperpyramids. We develop techniques that satisfy efficient retrieval by optimizing the number of buckets retrieved by the query, disk arm movement and I/O parallelism. We prove that concentric hypercube-based mapping satisfies the optimal clustering and optimal parallelism. We develop a technique based on hyperpyramid partitioning that reduces the number of buckets retrieved by the query and has efficient inter- and intra-disk organizations. We evaluate the performance of proposed techniques by comparing them with the current approaches. The new techniques lead to very significant improvement over the existing techniques, and result in fast retrieval of multi-dimensional data.

Original languageEnglish
Title of host publicationInternational Conference on Information and Knowledge Management, Proceedings
Place of PublicationNew York, NY, United States
PublisherACM
Pages343-350
Number of pages8
ISBN (Print)1581131461
Publication statusPublished - 1 Dec 1999
Externally publishedYes
EventProceedings of the 1999 8th International Conference on Information Knowledge Management (CIKM'99) - Kansas City, MO, USA
Duration: 2 Nov 19996 Nov 1999

Other

OtherProceedings of the 1999 8th International Conference on Information Knowledge Management (CIKM'99)
CityKansas City, MO, USA
Period2/11/996/11/99

Fingerprint

Data clustering
Partitioning
Query
Grid
Data base
Clustering
Multimedia
Dimensionality

ASJC Scopus subject areas

  • Business, Management and Accounting(all)

Cite this

Ferhatosmanoglu, H., Agrawal, D., & Abbadi, A. E. (1999). Clustering declustered data for efficient retrieval. In International Conference on Information and Knowledge Management, Proceedings (pp. 343-350). New York, NY, United States: ACM.

Clustering declustered data for efficient retrieval. / Ferhatosmanoglu, Hakan; Agrawal, Divyakant; Abbadi, Amr El.

International Conference on Information and Knowledge Management, Proceedings. New York, NY, United States : ACM, 1999. p. 343-350.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ferhatosmanoglu, H, Agrawal, D & Abbadi, AE 1999, Clustering declustered data for efficient retrieval. in International Conference on Information and Knowledge Management, Proceedings. ACM, New York, NY, United States, pp. 343-350, Proceedings of the 1999 8th International Conference on Information Knowledge Management (CIKM'99), Kansas City, MO, USA, 2/11/99.
Ferhatosmanoglu H, Agrawal D, Abbadi AE. Clustering declustered data for efficient retrieval. In International Conference on Information and Knowledge Management, Proceedings. New York, NY, United States: ACM. 1999. p. 343-350
Ferhatosmanoglu, Hakan ; Agrawal, Divyakant ; Abbadi, Amr El. / Clustering declustered data for efficient retrieval. International Conference on Information and Knowledge Management, Proceedings. New York, NY, United States : ACM, 1999. pp. 343-350
@inproceedings{206d6147af1a4e53a37af16d9e4aac37,
title = "Clustering declustered data for efficient retrieval",
abstract = "Modern databases increasingly integrate new kinds of information, such as multimedia information in the form of image, video, and audio data. Both the dimensionality and the amount of data that need to be processed is increasing rapidly, increasing the demand for the efficient retrieval of large amounts of multi-dimensional data. Declustering techniques for multi-disk architectures have been effectively used for storage. In this paper, we first establish that besides exploiting the parallelism, a careful organization of each disk must be considered for fast searching. We introduce the notion of page allocation and data space mapping which can be used to organize and retrieve multidimensional data. We develop these notions based on three different partitioning strategies: regular grid partitioning, concentric hypercubes and hyperpyramids. We develop techniques that satisfy efficient retrieval by optimizing the number of buckets retrieved by the query, disk arm movement and I/O parallelism. We prove that concentric hypercube-based mapping satisfies the optimal clustering and optimal parallelism. We develop a technique based on hyperpyramid partitioning that reduces the number of buckets retrieved by the query and has efficient inter- and intra-disk organizations. We evaluate the performance of proposed techniques by comparing them with the current approaches. The new techniques lead to very significant improvement over the existing techniques, and result in fast retrieval of multi-dimensional data.",
author = "Hakan Ferhatosmanoglu and Divyakant Agrawal and Abbadi, {Amr El}",
year = "1999",
month = "12",
day = "1",
language = "English",
isbn = "1581131461",
pages = "343--350",
booktitle = "International Conference on Information and Knowledge Management, Proceedings",
publisher = "ACM",

}

TY - GEN

T1 - Clustering declustered data for efficient retrieval

AU - Ferhatosmanoglu, Hakan

AU - Agrawal, Divyakant

AU - Abbadi, Amr El

PY - 1999/12/1

Y1 - 1999/12/1

N2 - Modern databases increasingly integrate new kinds of information, such as multimedia information in the form of image, video, and audio data. Both the dimensionality and the amount of data that need to be processed is increasing rapidly, increasing the demand for the efficient retrieval of large amounts of multi-dimensional data. Declustering techniques for multi-disk architectures have been effectively used for storage. In this paper, we first establish that besides exploiting the parallelism, a careful organization of each disk must be considered for fast searching. We introduce the notion of page allocation and data space mapping which can be used to organize and retrieve multidimensional data. We develop these notions based on three different partitioning strategies: regular grid partitioning, concentric hypercubes and hyperpyramids. We develop techniques that satisfy efficient retrieval by optimizing the number of buckets retrieved by the query, disk arm movement and I/O parallelism. We prove that concentric hypercube-based mapping satisfies the optimal clustering and optimal parallelism. We develop a technique based on hyperpyramid partitioning that reduces the number of buckets retrieved by the query and has efficient inter- and intra-disk organizations. We evaluate the performance of proposed techniques by comparing them with the current approaches. The new techniques lead to very significant improvement over the existing techniques, and result in fast retrieval of multi-dimensional data.

AB - Modern databases increasingly integrate new kinds of information, such as multimedia information in the form of image, video, and audio data. Both the dimensionality and the amount of data that need to be processed is increasing rapidly, increasing the demand for the efficient retrieval of large amounts of multi-dimensional data. Declustering techniques for multi-disk architectures have been effectively used for storage. In this paper, we first establish that besides exploiting the parallelism, a careful organization of each disk must be considered for fast searching. We introduce the notion of page allocation and data space mapping which can be used to organize and retrieve multidimensional data. We develop these notions based on three different partitioning strategies: regular grid partitioning, concentric hypercubes and hyperpyramids. We develop techniques that satisfy efficient retrieval by optimizing the number of buckets retrieved by the query, disk arm movement and I/O parallelism. We prove that concentric hypercube-based mapping satisfies the optimal clustering and optimal parallelism. We develop a technique based on hyperpyramid partitioning that reduces the number of buckets retrieved by the query and has efficient inter- and intra-disk organizations. We evaluate the performance of proposed techniques by comparing them with the current approaches. The new techniques lead to very significant improvement over the existing techniques, and result in fast retrieval of multi-dimensional data.

UR - http://www.scopus.com/inward/record.url?scp=0033279660&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0033279660&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0033279660

SN - 1581131461

SP - 343

EP - 350

BT - International Conference on Information and Knowledge Management, Proceedings

PB - ACM

CY - New York, NY, United States

ER -