Accessing scientific data: Simpler is better

Mirek Riedewald, Divyakant Agrawal, Amr El Abbadi, Flip Korn

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

A variety of index structures has been proposed for supporting fast access and summarization of large multidimensional data sets. Some of these indices are fairly involved, hence few are used in practice. In this paper we examine how to reduce the I/O cost by taking full advantage of recent trends in hard disk development which favor reading large chunks of consecutive disk blocks over seeking and searching. We present the Multiresolution File Scan (MFS) approach which is based on a surprisingly simple and flexible data structure which outperforms sophisticated multidimensional indices, even if they are bulk-loaded and hence optimized for query processing. Our approach also has the advantage that it can incorporate a priori knowledge about the query workload. It readily supports summarization using distributive (e.g., count, sum, max, min) and algebraic (e.g., avg) aggregate operators.

Original languageEnglish
Pages (from-to)214-232
Number of pages19
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2750
Publication statusPublished - 1 Dec 2003
Externally publishedYes

Fingerprint

Query processing
Hard disk storage
Workload
Data structures
Reading
Summarization
Costs and Cost Analysis
Costs
Flexible Structure
Multidimensional Data
Large Data
Query Processing
Min-max
Multiresolution
Consecutive
Data Structures
Count
Query
Operator
Datasets

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Accessing scientific data : Simpler is better. / Riedewald, Mirek; Agrawal, Divyakant; El Abbadi, Amr; Korn, Flip.

In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 2750, 01.12.2003, p. 214-232.

Research output: Contribution to journalArticle

@article{472b4a79d23c43bcac6301c32e57743d,
title = "Accessing scientific data: Simpler is better",
abstract = "A variety of index structures has been proposed for supporting fast access and summarization of large multidimensional data sets. Some of these indices are fairly involved, hence few are used in practice. In this paper we examine how to reduce the I/O cost by taking full advantage of recent trends in hard disk development which favor reading large chunks of consecutive disk blocks over seeking and searching. We present the Multiresolution File Scan (MFS) approach which is based on a surprisingly simple and flexible data structure which outperforms sophisticated multidimensional indices, even if they are bulk-loaded and hence optimized for query processing. Our approach also has the advantage that it can incorporate a priori knowledge about the query workload. It readily supports summarization using distributive (e.g., count, sum, max, min) and algebraic (e.g., avg) aggregate operators.",
author = "Mirek Riedewald and Divyakant Agrawal and {El Abbadi}, Amr and Flip Korn",
year = "2003",
month = "12",
day = "1",
language = "English",
volume = "2750",
pages = "214--232",
journal = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Accessing scientific data

T2 - Simpler is better

AU - Riedewald, Mirek

AU - Agrawal, Divyakant

AU - El Abbadi, Amr

AU - Korn, Flip

PY - 2003/12/1

Y1 - 2003/12/1

N2 - A variety of index structures has been proposed for supporting fast access and summarization of large multidimensional data sets. Some of these indices are fairly involved, hence few are used in practice. In this paper we examine how to reduce the I/O cost by taking full advantage of recent trends in hard disk development which favor reading large chunks of consecutive disk blocks over seeking and searching. We present the Multiresolution File Scan (MFS) approach which is based on a surprisingly simple and flexible data structure which outperforms sophisticated multidimensional indices, even if they are bulk-loaded and hence optimized for query processing. Our approach also has the advantage that it can incorporate a priori knowledge about the query workload. It readily supports summarization using distributive (e.g., count, sum, max, min) and algebraic (e.g., avg) aggregate operators.

AB - A variety of index structures has been proposed for supporting fast access and summarization of large multidimensional data sets. Some of these indices are fairly involved, hence few are used in practice. In this paper we examine how to reduce the I/O cost by taking full advantage of recent trends in hard disk development which favor reading large chunks of consecutive disk blocks over seeking and searching. We present the Multiresolution File Scan (MFS) approach which is based on a surprisingly simple and flexible data structure which outperforms sophisticated multidimensional indices, even if they are bulk-loaded and hence optimized for query processing. Our approach also has the advantage that it can incorporate a priori knowledge about the query workload. It readily supports summarization using distributive (e.g., count, sum, max, min) and algebraic (e.g., avg) aggregate operators.

UR - http://www.scopus.com/inward/record.url?scp=35248858865&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=35248858865&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:35248858865

VL - 2750

SP - 214

EP - 232

JO - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

JF - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SN - 0302-9743

ER -