Declustering large multidimensional data sets for range queries over heterogeneous disks

Jonghyun Lee, M. Winslett, Xiaosong Ma, Shengke Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Declustering is a technique to distribute data sets over multiple disks so that future retrievals can be well balanced over the disks and be performed in parallel. Although clusters often have heterogeneous disks, most declustering work has focused only on homogeneous environments. In this work, we investigate the declustering problem for a heterogeneous disk environment using virtual servers, and propose approaches for deciding the number of virtual servers and the mapping between virtual servers and physical disks. Our experimental results show that by combining our algorithm for choosing the number of virtual servers with a greedy algorithm for mapping virtual servers to disks, users can expect range query retrieval performance within 4% of the optimum achievable in practice on average, in all configurations studied. Compared to an intuitively natural approach to the problem, this represents an improvement of 8-31% in average fetch ratio, as well a 26-38% reduction in the standard deviation of performance for small queries.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Scientific and Statistical Database Management, SSDBM
PublisherIEEE Computer Society
Pages212-221
Number of pages10
Volume2003-January
ISBN (Print)0769519644
DOIs
Publication statusPublished - 2003
Externally publishedYes
Event15th International Conference on Scientific and Statistical Database Management, SSDBM 2003 - Cambridge, United States
Duration: 9 Jul 200311 Jul 2003

Other

Other15th International Conference on Scientific and Statistical Database Management, SSDBM 2003
CountryUnited States
CityCambridge
Period9/7/0311/7/03

Fingerprint

Servers
Virtual reality

Keywords

  • Aggregates
  • Bandwidth
  • Computer science
  • Data visualization
  • Databases
  • Greedy algorithms
  • Image retrieval
  • Information retrieval
  • Multidimensional systems
  • Throughput

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Lee, J., Winslett, M., Ma, X., & Yu, S. (2003). Declustering large multidimensional data sets for range queries over heterogeneous disks. In Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM (Vol. 2003-January, pp. 212-221). [1214982] IEEE Computer Society. https://doi.org/10.1109/SSDM.2003.1214982

Declustering large multidimensional data sets for range queries over heterogeneous disks. / Lee, Jonghyun; Winslett, M.; Ma, Xiaosong; Yu, Shengke.

Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM. Vol. 2003-January IEEE Computer Society, 2003. p. 212-221 1214982.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lee, J, Winslett, M, Ma, X & Yu, S 2003, Declustering large multidimensional data sets for range queries over heterogeneous disks. in Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM. vol. 2003-January, 1214982, IEEE Computer Society, pp. 212-221, 15th International Conference on Scientific and Statistical Database Management, SSDBM 2003, Cambridge, United States, 9/7/03. https://doi.org/10.1109/SSDM.2003.1214982
Lee J, Winslett M, Ma X, Yu S. Declustering large multidimensional data sets for range queries over heterogeneous disks. In Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM. Vol. 2003-January. IEEE Computer Society. 2003. p. 212-221. 1214982 https://doi.org/10.1109/SSDM.2003.1214982
Lee, Jonghyun ; Winslett, M. ; Ma, Xiaosong ; Yu, Shengke. / Declustering large multidimensional data sets for range queries over heterogeneous disks. Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM. Vol. 2003-January IEEE Computer Society, 2003. pp. 212-221
@inproceedings{77b3304715b3420c8ea0421904fe53ff,
title = "Declustering large multidimensional data sets for range queries over heterogeneous disks",
abstract = "Declustering is a technique to distribute data sets over multiple disks so that future retrievals can be well balanced over the disks and be performed in parallel. Although clusters often have heterogeneous disks, most declustering work has focused only on homogeneous environments. In this work, we investigate the declustering problem for a heterogeneous disk environment using virtual servers, and propose approaches for deciding the number of virtual servers and the mapping between virtual servers and physical disks. Our experimental results show that by combining our algorithm for choosing the number of virtual servers with a greedy algorithm for mapping virtual servers to disks, users can expect range query retrieval performance within 4{\%} of the optimum achievable in practice on average, in all configurations studied. Compared to an intuitively natural approach to the problem, this represents an improvement of 8-31{\%} in average fetch ratio, as well a 26-38{\%} reduction in the standard deviation of performance for small queries.",
keywords = "Aggregates, Bandwidth, Computer science, Data visualization, Databases, Greedy algorithms, Image retrieval, Information retrieval, Multidimensional systems, Throughput",
author = "Jonghyun Lee and M. Winslett and Xiaosong Ma and Shengke Yu",
year = "2003",
doi = "10.1109/SSDM.2003.1214982",
language = "English",
isbn = "0769519644",
volume = "2003-January",
pages = "212--221",
booktitle = "Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Declustering large multidimensional data sets for range queries over heterogeneous disks

AU - Lee, Jonghyun

AU - Winslett, M.

AU - Ma, Xiaosong

AU - Yu, Shengke

PY - 2003

Y1 - 2003

N2 - Declustering is a technique to distribute data sets over multiple disks so that future retrievals can be well balanced over the disks and be performed in parallel. Although clusters often have heterogeneous disks, most declustering work has focused only on homogeneous environments. In this work, we investigate the declustering problem for a heterogeneous disk environment using virtual servers, and propose approaches for deciding the number of virtual servers and the mapping between virtual servers and physical disks. Our experimental results show that by combining our algorithm for choosing the number of virtual servers with a greedy algorithm for mapping virtual servers to disks, users can expect range query retrieval performance within 4% of the optimum achievable in practice on average, in all configurations studied. Compared to an intuitively natural approach to the problem, this represents an improvement of 8-31% in average fetch ratio, as well a 26-38% reduction in the standard deviation of performance for small queries.

AB - Declustering is a technique to distribute data sets over multiple disks so that future retrievals can be well balanced over the disks and be performed in parallel. Although clusters often have heterogeneous disks, most declustering work has focused only on homogeneous environments. In this work, we investigate the declustering problem for a heterogeneous disk environment using virtual servers, and propose approaches for deciding the number of virtual servers and the mapping between virtual servers and physical disks. Our experimental results show that by combining our algorithm for choosing the number of virtual servers with a greedy algorithm for mapping virtual servers to disks, users can expect range query retrieval performance within 4% of the optimum achievable in practice on average, in all configurations studied. Compared to an intuitively natural approach to the problem, this represents an improvement of 8-31% in average fetch ratio, as well a 26-38% reduction in the standard deviation of performance for small queries.

KW - Aggregates

KW - Bandwidth

KW - Computer science

KW - Data visualization

KW - Databases

KW - Greedy algorithms

KW - Image retrieval

KW - Information retrieval

KW - Multidimensional systems

KW - Throughput

UR - http://www.scopus.com/inward/record.url?scp=84943231574&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84943231574&partnerID=8YFLogxK

U2 - 10.1109/SSDM.2003.1214982

DO - 10.1109/SSDM.2003.1214982

M3 - Conference contribution

SN - 0769519644

VL - 2003-January

SP - 212

EP - 221

BT - Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM

PB - IEEE Computer Society

ER -