Aggregate estimation over dynamic hidden web databases

Weimo Liu, Saravanan Thirumuruganathan, Nan Zhang, Gautam Das

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Many databases on the web are "hidden" behind (i.e., accessible only through) their restrictive, form-like, search interfaces. Recent studies have shown that it is possible to estimate aggregate query answers over such hidden web databases by issuing a small number of carefully designed search queries through the restrictive web interface. A problem with these existing work, however, is that they all assume the underlying database to be static, while most realworld web databases (e.g., Amazon, eBay) are frequently updated. In this paper, we study the novel problem of estimating/tracking aggregates over dynamic hidden web databases while adhering to the stringent query-cost limitation they enforce (e.g., at most 1,000 search queries per day). Theoretical analysis and extensive realworld experiments demonstrate the effectiveness of our proposed algorithms and their superiority over baseline solutions (e.g., the repeated execution of algorithms designed for static web databases).

Original languageEnglish
Pages (from-to)1107-1118
Number of pages12
JournalProceedings of the VLDB Endowment
Volume7
Issue number12
DOIs
Publication statusPublished - 1 Jan 2014
Externally publishedYes

Fingerprint

Costs
Experiments

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Aggregate estimation over dynamic hidden web databases. / Liu, Weimo; Thirumuruganathan, Saravanan; Zhang, Nan; Das, Gautam.

In: Proceedings of the VLDB Endowment, Vol. 7, No. 12, 01.01.2014, p. 1107-1118.

Research output: Contribution to journalArticle

Liu, Weimo ; Thirumuruganathan, Saravanan ; Zhang, Nan ; Das, Gautam. / Aggregate estimation over dynamic hidden web databases. In: Proceedings of the VLDB Endowment. 2014 ; Vol. 7, No. 12. pp. 1107-1118.
@article{6930aa2bbd5246e3804c60917cc3dd2c,
title = "Aggregate estimation over dynamic hidden web databases",
abstract = "Many databases on the web are {"}hidden{"} behind (i.e., accessible only through) their restrictive, form-like, search interfaces. Recent studies have shown that it is possible to estimate aggregate query answers over such hidden web databases by issuing a small number of carefully designed search queries through the restrictive web interface. A problem with these existing work, however, is that they all assume the underlying database to be static, while most realworld web databases (e.g., Amazon, eBay) are frequently updated. In this paper, we study the novel problem of estimating/tracking aggregates over dynamic hidden web databases while adhering to the stringent query-cost limitation they enforce (e.g., at most 1,000 search queries per day). Theoretical analysis and extensive realworld experiments demonstrate the effectiveness of our proposed algorithms and their superiority over baseline solutions (e.g., the repeated execution of algorithms designed for static web databases).",
author = "Weimo Liu and Saravanan Thirumuruganathan and Nan Zhang and Gautam Das",
year = "2014",
month = "1",
day = "1",
doi = "10.14778/2732977.2732985",
language = "English",
volume = "7",
pages = "1107--1118",
journal = "Proceedings of the VLDB Endowment",
issn = "2150-8097",
publisher = "Very Large Data Base Endowment Inc.",
number = "12",

}

TY - JOUR

T1 - Aggregate estimation over dynamic hidden web databases

AU - Liu, Weimo

AU - Thirumuruganathan, Saravanan

AU - Zhang, Nan

AU - Das, Gautam

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Many databases on the web are "hidden" behind (i.e., accessible only through) their restrictive, form-like, search interfaces. Recent studies have shown that it is possible to estimate aggregate query answers over such hidden web databases by issuing a small number of carefully designed search queries through the restrictive web interface. A problem with these existing work, however, is that they all assume the underlying database to be static, while most realworld web databases (e.g., Amazon, eBay) are frequently updated. In this paper, we study the novel problem of estimating/tracking aggregates over dynamic hidden web databases while adhering to the stringent query-cost limitation they enforce (e.g., at most 1,000 search queries per day). Theoretical analysis and extensive realworld experiments demonstrate the effectiveness of our proposed algorithms and their superiority over baseline solutions (e.g., the repeated execution of algorithms designed for static web databases).

AB - Many databases on the web are "hidden" behind (i.e., accessible only through) their restrictive, form-like, search interfaces. Recent studies have shown that it is possible to estimate aggregate query answers over such hidden web databases by issuing a small number of carefully designed search queries through the restrictive web interface. A problem with these existing work, however, is that they all assume the underlying database to be static, while most realworld web databases (e.g., Amazon, eBay) are frequently updated. In this paper, we study the novel problem of estimating/tracking aggregates over dynamic hidden web databases while adhering to the stringent query-cost limitation they enforce (e.g., at most 1,000 search queries per day). Theoretical analysis and extensive realworld experiments demonstrate the effectiveness of our proposed algorithms and their superiority over baseline solutions (e.g., the repeated execution of algorithms designed for static web databases).

UR - http://www.scopus.com/inward/record.url?scp=84905106850&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84905106850&partnerID=8YFLogxK

U2 - 10.14778/2732977.2732985

DO - 10.14778/2732977.2732985

M3 - Article

VL - 7

SP - 1107

EP - 1118

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

SN - 2150-8097

IS - 12

ER -