Abstract
Declustering is a technique to distribute data sets over multiple disks so that future retrievals can be well balanced over the disks and be performed in parallel. Although clusters often have heterogeneous disks, most declustering work has focused only on homogeneous environments. In this work, we investigate the declustering problem for a heterogeneous disk environment using virtual servers, and propose approaches for deciding the number of virtual servers and the mapping between virtual servers and physical disks. Our experimental results show that by combining our algorithm for choosing the number of virtual servers with a greedy algorithm for mapping virtual servers to disks, users can expect range query retrieval performance within 4% of the optimum achievable in practice on average, in all configurations studied. Compared to an intuitively natural approach to the problem, this represents an improvement of 8-31% in average fetch ratio, as well a 26-38% reduction in the standard deviation of performance for small queries.
Original language | English |
---|---|
Title of host publication | Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM |
Publisher | IEEE Computer Society |
Pages | 212-221 |
Number of pages | 10 |
Volume | 2003-January |
ISBN (Print) | 0769519644 |
DOIs | |
Publication status | Published - 2003 |
Externally published | Yes |
Event | 15th International Conference on Scientific and Statistical Database Management, SSDBM 2003 - Cambridge, United States Duration: 9 Jul 2003 → 11 Jul 2003 |
Other
Other | 15th International Conference on Scientific and Statistical Database Management, SSDBM 2003 |
---|---|
Country | United States |
City | Cambridge |
Period | 9/7/03 → 11/7/03 |
Fingerprint
Keywords
- Aggregates
- Bandwidth
- Computer science
- Data visualization
- Databases
- Greedy algorithms
- Image retrieval
- Information retrieval
- Multidimensional systems
- Throughput
ASJC Scopus subject areas
- Software
- Information Systems
Cite this
Declustering large multidimensional data sets for range queries over heterogeneous disks. / Lee, Jonghyun; Winslett, M.; Ma, Xiaosong; Yu, Shengke.
Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM. Vol. 2003-January IEEE Computer Society, 2003. p. 212-221 1214982.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - Declustering large multidimensional data sets for range queries over heterogeneous disks
AU - Lee, Jonghyun
AU - Winslett, M.
AU - Ma, Xiaosong
AU - Yu, Shengke
PY - 2003
Y1 - 2003
N2 - Declustering is a technique to distribute data sets over multiple disks so that future retrievals can be well balanced over the disks and be performed in parallel. Although clusters often have heterogeneous disks, most declustering work has focused only on homogeneous environments. In this work, we investigate the declustering problem for a heterogeneous disk environment using virtual servers, and propose approaches for deciding the number of virtual servers and the mapping between virtual servers and physical disks. Our experimental results show that by combining our algorithm for choosing the number of virtual servers with a greedy algorithm for mapping virtual servers to disks, users can expect range query retrieval performance within 4% of the optimum achievable in practice on average, in all configurations studied. Compared to an intuitively natural approach to the problem, this represents an improvement of 8-31% in average fetch ratio, as well a 26-38% reduction in the standard deviation of performance for small queries.
AB - Declustering is a technique to distribute data sets over multiple disks so that future retrievals can be well balanced over the disks and be performed in parallel. Although clusters often have heterogeneous disks, most declustering work has focused only on homogeneous environments. In this work, we investigate the declustering problem for a heterogeneous disk environment using virtual servers, and propose approaches for deciding the number of virtual servers and the mapping between virtual servers and physical disks. Our experimental results show that by combining our algorithm for choosing the number of virtual servers with a greedy algorithm for mapping virtual servers to disks, users can expect range query retrieval performance within 4% of the optimum achievable in practice on average, in all configurations studied. Compared to an intuitively natural approach to the problem, this represents an improvement of 8-31% in average fetch ratio, as well a 26-38% reduction in the standard deviation of performance for small queries.
KW - Aggregates
KW - Bandwidth
KW - Computer science
KW - Data visualization
KW - Databases
KW - Greedy algorithms
KW - Image retrieval
KW - Information retrieval
KW - Multidimensional systems
KW - Throughput
UR - http://www.scopus.com/inward/record.url?scp=84943231574&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84943231574&partnerID=8YFLogxK
U2 - 10.1109/SSDM.2003.1214982
DO - 10.1109/SSDM.2003.1214982
M3 - Conference contribution
AN - SCOPUS:84943231574
SN - 0769519644
VL - 2003-January
SP - 212
EP - 221
BT - Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM
PB - IEEE Computer Society
ER -