Data space mapping for efficient I/O in large multi-dimensional databases

Hakan Ferhatosmanoglu, Aravind Ramachandran, Divyakant Agrawal, Amr El Abbadi

Research output: Contribution to journalArticle

Abstract

In this paper, we propose data space mapping techniques for storage and retrieval in multi-dimensional databases on multi-disk architectures. We identify the important factors for an efficient multi-disk searching of multi-dimensional data and develop secondary storage organization and retrieval techniques that directly address these factors. We especially focus on high dimensional data, where none of the current approaches are effective. In contrast to the current declustering techniques, storage techniques in this paper consider both inter- and intra-disk organization of the data. The data space is first partitioned into buckets, then the buckets are declustered to multiple disks while they are clustered in each disk. The queries are executed through bucket identification techniques that locate the pages. One of the partitioning techniques we discuss is especially practical for high dimensional data, and our disk and page allocation techniques are optimal with respect to number of I/O accesses and seek times. We provide experimental results that support our claims on two real high dimensional datasets.

Original languageEnglish
Pages (from-to)83-103
Number of pages21
JournalInformation Systems
Volume32
Issue number1
DOIs
Publication statusPublished - 1 Mar 2007
Externally publishedYes

Fingerprint

Data base
Factors
Partitioning
Query

Keywords

  • Data space mapping
  • Disk and page allocation
  • High dimensional data
  • Multi-disk architectures
  • Parallel I/O
  • Performance
  • Space partitioning
  • Storage

ASJC Scopus subject areas

  • Management Information Systems
  • Management of Technology and Innovation
  • Hardware and Architecture
  • Information Systems
  • Software

Cite this

Ferhatosmanoglu, H., Ramachandran, A., Agrawal, D., & El Abbadi, A. (2007). Data space mapping for efficient I/O in large multi-dimensional databases. Information Systems, 32(1), 83-103. https://doi.org/10.1016/j.is.2005.06.001

Data space mapping for efficient I/O in large multi-dimensional databases. / Ferhatosmanoglu, Hakan; Ramachandran, Aravind; Agrawal, Divyakant; El Abbadi, Amr.

In: Information Systems, Vol. 32, No. 1, 01.03.2007, p. 83-103.

Research output: Contribution to journalArticle

Ferhatosmanoglu, H, Ramachandran, A, Agrawal, D & El Abbadi, A 2007, 'Data space mapping for efficient I/O in large multi-dimensional databases', Information Systems, vol. 32, no. 1, pp. 83-103. https://doi.org/10.1016/j.is.2005.06.001
Ferhatosmanoglu H, Ramachandran A, Agrawal D, El Abbadi A. Data space mapping for efficient I/O in large multi-dimensional databases. Information Systems. 2007 Mar 1;32(1):83-103. https://doi.org/10.1016/j.is.2005.06.001
Ferhatosmanoglu, Hakan ; Ramachandran, Aravind ; Agrawal, Divyakant ; El Abbadi, Amr. / Data space mapping for efficient I/O in large multi-dimensional databases. In: Information Systems. 2007 ; Vol. 32, No. 1. pp. 83-103.
@article{5c74d775cb764dc69785d3bd9f6aba39,
title = "Data space mapping for efficient I/O in large multi-dimensional databases",
abstract = "In this paper, we propose data space mapping techniques for storage and retrieval in multi-dimensional databases on multi-disk architectures. We identify the important factors for an efficient multi-disk searching of multi-dimensional data and develop secondary storage organization and retrieval techniques that directly address these factors. We especially focus on high dimensional data, where none of the current approaches are effective. In contrast to the current declustering techniques, storage techniques in this paper consider both inter- and intra-disk organization of the data. The data space is first partitioned into buckets, then the buckets are declustered to multiple disks while they are clustered in each disk. The queries are executed through bucket identification techniques that locate the pages. One of the partitioning techniques we discuss is especially practical for high dimensional data, and our disk and page allocation techniques are optimal with respect to number of I/O accesses and seek times. We provide experimental results that support our claims on two real high dimensional datasets.",
keywords = "Data space mapping, Disk and page allocation, High dimensional data, Multi-disk architectures, Parallel I/O, Performance, Space partitioning, Storage",
author = "Hakan Ferhatosmanoglu and Aravind Ramachandran and Divyakant Agrawal and {El Abbadi}, Amr",
year = "2007",
month = "3",
day = "1",
doi = "10.1016/j.is.2005.06.001",
language = "English",
volume = "32",
pages = "83--103",
journal = "Information Systems",
issn = "0306-4379",
publisher = "Elsevier Limited",
number = "1",

}

TY - JOUR

T1 - Data space mapping for efficient I/O in large multi-dimensional databases

AU - Ferhatosmanoglu, Hakan

AU - Ramachandran, Aravind

AU - Agrawal, Divyakant

AU - El Abbadi, Amr

PY - 2007/3/1

Y1 - 2007/3/1

N2 - In this paper, we propose data space mapping techniques for storage and retrieval in multi-dimensional databases on multi-disk architectures. We identify the important factors for an efficient multi-disk searching of multi-dimensional data and develop secondary storage organization and retrieval techniques that directly address these factors. We especially focus on high dimensional data, where none of the current approaches are effective. In contrast to the current declustering techniques, storage techniques in this paper consider both inter- and intra-disk organization of the data. The data space is first partitioned into buckets, then the buckets are declustered to multiple disks while they are clustered in each disk. The queries are executed through bucket identification techniques that locate the pages. One of the partitioning techniques we discuss is especially practical for high dimensional data, and our disk and page allocation techniques are optimal with respect to number of I/O accesses and seek times. We provide experimental results that support our claims on two real high dimensional datasets.

AB - In this paper, we propose data space mapping techniques for storage and retrieval in multi-dimensional databases on multi-disk architectures. We identify the important factors for an efficient multi-disk searching of multi-dimensional data and develop secondary storage organization and retrieval techniques that directly address these factors. We especially focus on high dimensional data, where none of the current approaches are effective. In contrast to the current declustering techniques, storage techniques in this paper consider both inter- and intra-disk organization of the data. The data space is first partitioned into buckets, then the buckets are declustered to multiple disks while they are clustered in each disk. The queries are executed through bucket identification techniques that locate the pages. One of the partitioning techniques we discuss is especially practical for high dimensional data, and our disk and page allocation techniques are optimal with respect to number of I/O accesses and seek times. We provide experimental results that support our claims on two real high dimensional datasets.

KW - Data space mapping

KW - Disk and page allocation

KW - High dimensional data

KW - Multi-disk architectures

KW - Parallel I/O

KW - Performance

KW - Space partitioning

KW - Storage

UR - http://www.scopus.com/inward/record.url?scp=33749575621&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33749575621&partnerID=8YFLogxK

U2 - 10.1016/j.is.2005.06.001

DO - 10.1016/j.is.2005.06.001

M3 - Article

VL - 32

SP - 83

EP - 103

JO - Information Systems

JF - Information Systems

SN - 0306-4379

IS - 1

ER -