Exploring spatial datasets with histograms

Chengyu Sun, Nagender Bandi, Divyakant Agrawal, Amr El Abbadi

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

As online spatial datasets grow both in number and sophistication, it becomes increasingly difficult for users to decide whether a dataset is suitable for their tasks, especially when they do not have prior knowledge of the dataset. In this paper, we propose browsing as an effective and efficient way to explore the content of a spatial dataset. Browsing allows users to view the size of a result set before evaluating the query at the database, thereby avoiding zero-hit/mega-hit queries and saving time and resources. Although the underlying technique supporting browsing is similar to range query aggregation and selectivity estimation, spatial dataset browsing poses some unique challenges. In this paper, we identify a set of spatial relations that need to be supported in browsing applications, namely, the contains, contained and the overlap relations. We prove a lower bound on the storage required to answer queries about the contains relation accurately at a given resolution. We then present three storage-efficient approximation algorithms which we believe to be the first to estimate query results about these spatial relations. We evaluate these algorithms with both synthetic and real world datasets and show that they provide highly accurate estimates for datasets with various characteristics.

Original languageEnglish
Pages (from-to)57-88
Number of pages32
JournalDistributed and Parallel Databases
Volume20
Issue number1
DOIs
Publication statusPublished - 1 Jul 2006
Externally publishedYes

Fingerprint

Approximation algorithms
Browsing
Histogram
Agglomeration
Query
Spatial Relations
Hits
Range Query
Selectivity
Prior Knowledge
Estimate
Approximation Algorithms
Overlap
Aggregation
Efficient Algorithms
Lower bound
Resources
Evaluate
Zero

Keywords

  • Databases
  • Geographic information systems
  • Query processing

ASJC Scopus subject areas

  • Information Systems
  • Theoretical Computer Science
  • Computational Theory and Mathematics

Cite this

Exploring spatial datasets with histograms. / Sun, Chengyu; Bandi, Nagender; Agrawal, Divyakant; El Abbadi, Amr.

In: Distributed and Parallel Databases, Vol. 20, No. 1, 01.07.2006, p. 57-88.

Research output: Contribution to journalArticle

Sun, C, Bandi, N, Agrawal, D & El Abbadi, A 2006, 'Exploring spatial datasets with histograms', Distributed and Parallel Databases, vol. 20, no. 1, pp. 57-88. https://doi.org/10.1007/s10619-006-8576-x
Sun, Chengyu ; Bandi, Nagender ; Agrawal, Divyakant ; El Abbadi, Amr. / Exploring spatial datasets with histograms. In: Distributed and Parallel Databases. 2006 ; Vol. 20, No. 1. pp. 57-88.
@article{a19f8cfb295e4811b05a6dfe4a8fb6eb,
title = "Exploring spatial datasets with histograms",
abstract = "As online spatial datasets grow both in number and sophistication, it becomes increasingly difficult for users to decide whether a dataset is suitable for their tasks, especially when they do not have prior knowledge of the dataset. In this paper, we propose browsing as an effective and efficient way to explore the content of a spatial dataset. Browsing allows users to view the size of a result set before evaluating the query at the database, thereby avoiding zero-hit/mega-hit queries and saving time and resources. Although the underlying technique supporting browsing is similar to range query aggregation and selectivity estimation, spatial dataset browsing poses some unique challenges. In this paper, we identify a set of spatial relations that need to be supported in browsing applications, namely, the contains, contained and the overlap relations. We prove a lower bound on the storage required to answer queries about the contains relation accurately at a given resolution. We then present three storage-efficient approximation algorithms which we believe to be the first to estimate query results about these spatial relations. We evaluate these algorithms with both synthetic and real world datasets and show that they provide highly accurate estimates for datasets with various characteristics.",
keywords = "Databases, Geographic information systems, Query processing",
author = "Chengyu Sun and Nagender Bandi and Divyakant Agrawal and {El Abbadi}, Amr",
year = "2006",
month = "7",
day = "1",
doi = "10.1007/s10619-006-8576-x",
language = "English",
volume = "20",
pages = "57--88",
journal = "Distributed and Parallel Databases",
issn = "0926-8782",
publisher = "Springer Netherlands",
number = "1",

}

TY - JOUR

T1 - Exploring spatial datasets with histograms

AU - Sun, Chengyu

AU - Bandi, Nagender

AU - Agrawal, Divyakant

AU - El Abbadi, Amr

PY - 2006/7/1

Y1 - 2006/7/1

N2 - As online spatial datasets grow both in number and sophistication, it becomes increasingly difficult for users to decide whether a dataset is suitable for their tasks, especially when they do not have prior knowledge of the dataset. In this paper, we propose browsing as an effective and efficient way to explore the content of a spatial dataset. Browsing allows users to view the size of a result set before evaluating the query at the database, thereby avoiding zero-hit/mega-hit queries and saving time and resources. Although the underlying technique supporting browsing is similar to range query aggregation and selectivity estimation, spatial dataset browsing poses some unique challenges. In this paper, we identify a set of spatial relations that need to be supported in browsing applications, namely, the contains, contained and the overlap relations. We prove a lower bound on the storage required to answer queries about the contains relation accurately at a given resolution. We then present three storage-efficient approximation algorithms which we believe to be the first to estimate query results about these spatial relations. We evaluate these algorithms with both synthetic and real world datasets and show that they provide highly accurate estimates for datasets with various characteristics.

AB - As online spatial datasets grow both in number and sophistication, it becomes increasingly difficult for users to decide whether a dataset is suitable for their tasks, especially when they do not have prior knowledge of the dataset. In this paper, we propose browsing as an effective and efficient way to explore the content of a spatial dataset. Browsing allows users to view the size of a result set before evaluating the query at the database, thereby avoiding zero-hit/mega-hit queries and saving time and resources. Although the underlying technique supporting browsing is similar to range query aggregation and selectivity estimation, spatial dataset browsing poses some unique challenges. In this paper, we identify a set of spatial relations that need to be supported in browsing applications, namely, the contains, contained and the overlap relations. We prove a lower bound on the storage required to answer queries about the contains relation accurately at a given resolution. We then present three storage-efficient approximation algorithms which we believe to be the first to estimate query results about these spatial relations. We evaluate these algorithms with both synthetic and real world datasets and show that they provide highly accurate estimates for datasets with various characteristics.

KW - Databases

KW - Geographic information systems

KW - Query processing

UR - http://www.scopus.com/inward/record.url?scp=33744937904&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33744937904&partnerID=8YFLogxK

U2 - 10.1007/s10619-006-8576-x

DO - 10.1007/s10619-006-8576-x

M3 - Article

VL - 20

SP - 57

EP - 88

JO - Distributed and Parallel Databases

JF - Distributed and Parallel Databases

SN - 0926-8782

IS - 1

ER -