Disk allocation for fast range and nearest-neighbor queries

Sunil Prabhakar, Divyakant Agrawal, Amr El Abbadi

Research output: Contribution to journalArticle

Abstract

As databases increasingly integrate non-textual multimedia information it is becoming necessary to support efficient similarity searching in addition to range searching. Range and nearest-neighbor (similarity) queries are the most important class of queries for multimedia and multi-dimensional databases. Due to the large sizes of the datasets involved, I/O is a critical factor limiting performance. The use of parallel I/O through declustering of the data is a promising approach to improve performance. Consequently several research efforts have addressed the problem of declustering multidimensional data for optimizing range and partial match queries. Very limited work has been done for similarity queries, and the problem of declustering for combined range and similarity queries has not been addressed in the literature. Consider a dataset of images where the following metadata for each image is also stored: date on which the picture was taken, longitude and latitude of the site of the picture. An example of a combined query is: Given a target image, find the 5 most similar images taken within 3 months of the target image and located within 2 degrees of longitude and latitude of the target image. In order to answer this query, it is necessary to conduct a range search on the date, longitude and latitude values and a similarity search on the image content. In this paper, we develop new declustering schemes that provide good declustering for similarity searching. In addition, we show that the new schemes have very good performance for range queries as well as combination queries. The new schemes are based upon the Cyclic declustering schemes which were developed for range and partial match queries. The Cyclic schemes not only provide superior performance to earlier schemes, but are also very robust and consistent with respect to query types and variations in system parameters.

Original languageEnglish
Pages (from-to)107-135
Number of pages29
JournalDistributed and Parallel Databases
Volume14
Issue number2
DOIs
Publication statusPublished - 1 Sep 2003
Externally publishedYes

Fingerprint

Nearest Neighbor
Query
Metadata
Range of data
Longitude
Date
Target
Multimedia
Nearest neighbor
Parallel I/O
Range Searching
Partial
Range Query
Similarity Search
Necessary
Multidimensional Data
Limiting
Integrate
Similarity

Keywords

  • Cyclic allocation
  • Multi-dimensional declustering
  • Parallel I/O

ASJC Scopus subject areas

  • Information Systems
  • Theoretical Computer Science
  • Computational Theory and Mathematics

Cite this

Disk allocation for fast range and nearest-neighbor queries. / Prabhakar, Sunil; Agrawal, Divyakant; El Abbadi, Amr.

In: Distributed and Parallel Databases, Vol. 14, No. 2, 01.09.2003, p. 107-135.

Research output: Contribution to journalArticle

Prabhakar, Sunil ; Agrawal, Divyakant ; El Abbadi, Amr. / Disk allocation for fast range and nearest-neighbor queries. In: Distributed and Parallel Databases. 2003 ; Vol. 14, No. 2. pp. 107-135.
@article{a345a425c7f645cf937b929992678c43,
title = "Disk allocation for fast range and nearest-neighbor queries",
abstract = "As databases increasingly integrate non-textual multimedia information it is becoming necessary to support efficient similarity searching in addition to range searching. Range and nearest-neighbor (similarity) queries are the most important class of queries for multimedia and multi-dimensional databases. Due to the large sizes of the datasets involved, I/O is a critical factor limiting performance. The use of parallel I/O through declustering of the data is a promising approach to improve performance. Consequently several research efforts have addressed the problem of declustering multidimensional data for optimizing range and partial match queries. Very limited work has been done for similarity queries, and the problem of declustering for combined range and similarity queries has not been addressed in the literature. Consider a dataset of images where the following metadata for each image is also stored: date on which the picture was taken, longitude and latitude of the site of the picture. An example of a combined query is: Given a target image, find the 5 most similar images taken within 3 months of the target image and located within 2 degrees of longitude and latitude of the target image. In order to answer this query, it is necessary to conduct a range search on the date, longitude and latitude values and a similarity search on the image content. In this paper, we develop new declustering schemes that provide good declustering for similarity searching. In addition, we show that the new schemes have very good performance for range queries as well as combination queries. The new schemes are based upon the Cyclic declustering schemes which were developed for range and partial match queries. The Cyclic schemes not only provide superior performance to earlier schemes, but are also very robust and consistent with respect to query types and variations in system parameters.",
keywords = "Cyclic allocation, Multi-dimensional declustering, Parallel I/O",
author = "Sunil Prabhakar and Divyakant Agrawal and {El Abbadi}, Amr",
year = "2003",
month = "9",
day = "1",
doi = "10.1023/A:1024895525526",
language = "English",
volume = "14",
pages = "107--135",
journal = "Distributed and Parallel Databases",
issn = "0926-8782",
publisher = "Springer Netherlands",
number = "2",

}

TY - JOUR

T1 - Disk allocation for fast range and nearest-neighbor queries

AU - Prabhakar, Sunil

AU - Agrawal, Divyakant

AU - El Abbadi, Amr

PY - 2003/9/1

Y1 - 2003/9/1

N2 - As databases increasingly integrate non-textual multimedia information it is becoming necessary to support efficient similarity searching in addition to range searching. Range and nearest-neighbor (similarity) queries are the most important class of queries for multimedia and multi-dimensional databases. Due to the large sizes of the datasets involved, I/O is a critical factor limiting performance. The use of parallel I/O through declustering of the data is a promising approach to improve performance. Consequently several research efforts have addressed the problem of declustering multidimensional data for optimizing range and partial match queries. Very limited work has been done for similarity queries, and the problem of declustering for combined range and similarity queries has not been addressed in the literature. Consider a dataset of images where the following metadata for each image is also stored: date on which the picture was taken, longitude and latitude of the site of the picture. An example of a combined query is: Given a target image, find the 5 most similar images taken within 3 months of the target image and located within 2 degrees of longitude and latitude of the target image. In order to answer this query, it is necessary to conduct a range search on the date, longitude and latitude values and a similarity search on the image content. In this paper, we develop new declustering schemes that provide good declustering for similarity searching. In addition, we show that the new schemes have very good performance for range queries as well as combination queries. The new schemes are based upon the Cyclic declustering schemes which were developed for range and partial match queries. The Cyclic schemes not only provide superior performance to earlier schemes, but are also very robust and consistent with respect to query types and variations in system parameters.

AB - As databases increasingly integrate non-textual multimedia information it is becoming necessary to support efficient similarity searching in addition to range searching. Range and nearest-neighbor (similarity) queries are the most important class of queries for multimedia and multi-dimensional databases. Due to the large sizes of the datasets involved, I/O is a critical factor limiting performance. The use of parallel I/O through declustering of the data is a promising approach to improve performance. Consequently several research efforts have addressed the problem of declustering multidimensional data for optimizing range and partial match queries. Very limited work has been done for similarity queries, and the problem of declustering for combined range and similarity queries has not been addressed in the literature. Consider a dataset of images where the following metadata for each image is also stored: date on which the picture was taken, longitude and latitude of the site of the picture. An example of a combined query is: Given a target image, find the 5 most similar images taken within 3 months of the target image and located within 2 degrees of longitude and latitude of the target image. In order to answer this query, it is necessary to conduct a range search on the date, longitude and latitude values and a similarity search on the image content. In this paper, we develop new declustering schemes that provide good declustering for similarity searching. In addition, we show that the new schemes have very good performance for range queries as well as combination queries. The new schemes are based upon the Cyclic declustering schemes which were developed for range and partial match queries. The Cyclic schemes not only provide superior performance to earlier schemes, but are also very robust and consistent with respect to query types and variations in system parameters.

KW - Cyclic allocation

KW - Multi-dimensional declustering

KW - Parallel I/O

UR - http://www.scopus.com/inward/record.url?scp=0042319061&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0042319061&partnerID=8YFLogxK

U2 - 10.1023/A:1024895525526

DO - 10.1023/A:1024895525526

M3 - Article

VL - 14

SP - 107

EP - 135

JO - Distributed and Parallel Databases

JF - Distributed and Parallel Databases

SN - 0926-8782

IS - 2

ER -