Exploring spatial datasets with histograms

Chengyu Sun, Nagender Bandi, Divyakant Agrawal, Amr El Abbadi

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

As online spatial datasets grow both in number and sophistication, it becomes increasingly difficult for users to decide whether a dataset is suitable for their tasks, especially when they do not have prior knowledge of the dataset. In this paper, we propose browsing as an effective and efficient way to explore the content of a spatial dataset. Browsing allows users to view the size of a result set before evaluating the query at the database, thereby avoiding zero-hit/mega-hit queries and saving time and resources. Although the underlying technique supporting browsing is similar to range query aggregation and selectivity estimation, spatial dataset browsing poses some unique challenges. In this paper, we identify a set of spatial relations that need to be supported in browsing applications, namely, the contains, contained and the overlap relations. We prove a lower bound on the storage required to answer queries about the contains relation accurately at a given resolution. We then present three storage-efficient approximation algorithms which we believe to be the first to estimate query results about these spatial relations. We evaluate these algorithms with both synthetic and real world datasets and show that they provide highly accurate estimates for datasets with various characteristics.

Original languageEnglish
Pages (from-to)57-88
Number of pages32
JournalDistributed and Parallel Databases
Volume20
Issue number1
DOIs
Publication statusPublished - 1 Jul 2006

    Fingerprint

Keywords

  • Databases
  • Geographic information systems
  • Query processing

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture
  • Information Systems and Management

Cite this