Abstract
Many databases on the web are "hidden" behind (i.e., accessible only through) their restrictive, form-like, search interfaces. Recent studies have shown that it is possible to estimate aggregate query answers over such hidden web databases by issuing a small number of carefully designed search queries through the restrictive web interface. A problem with these existing work, however, is that they all assume the underlying database to be static, while most realworld web databases (e.g., Amazon, eBay) are frequently updated. In this paper, we study the novel problem of estimating/tracking aggregates over dynamic hidden web databases while adhering to the stringent query-cost limitation they enforce (e.g., at most 1,000 search queries per day). Theoretical analysis and extensive realworld experiments demonstrate the effectiveness of our proposed algorithms and their superiority over baseline solutions (e.g., the repeated execution of algorithms designed for static web databases).
Original language | English |
---|---|
Pages (from-to) | 1107-1118 |
Number of pages | 12 |
Journal | Proceedings of the VLDB Endowment |
Volume | 7 |
Issue number | 12 |
DOIs | |
Publication status | Published - 1 Jan 2014 |
Externally published | Yes |
Fingerprint
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- Computer Science(all)
Cite this
Aggregate estimation over dynamic hidden web databases. / Liu, Weimo; Thirumuruganathan, Saravanan; Zhang, Nan; Das, Gautam.
In: Proceedings of the VLDB Endowment, Vol. 7, No. 12, 01.01.2014, p. 1107-1118.Research output: Contribution to journal › Article
}
TY - JOUR
T1 - Aggregate estimation over dynamic hidden web databases
AU - Liu, Weimo
AU - Thirumuruganathan, Saravanan
AU - Zhang, Nan
AU - Das, Gautam
PY - 2014/1/1
Y1 - 2014/1/1
N2 - Many databases on the web are "hidden" behind (i.e., accessible only through) their restrictive, form-like, search interfaces. Recent studies have shown that it is possible to estimate aggregate query answers over such hidden web databases by issuing a small number of carefully designed search queries through the restrictive web interface. A problem with these existing work, however, is that they all assume the underlying database to be static, while most realworld web databases (e.g., Amazon, eBay) are frequently updated. In this paper, we study the novel problem of estimating/tracking aggregates over dynamic hidden web databases while adhering to the stringent query-cost limitation they enforce (e.g., at most 1,000 search queries per day). Theoretical analysis and extensive realworld experiments demonstrate the effectiveness of our proposed algorithms and their superiority over baseline solutions (e.g., the repeated execution of algorithms designed for static web databases).
AB - Many databases on the web are "hidden" behind (i.e., accessible only through) their restrictive, form-like, search interfaces. Recent studies have shown that it is possible to estimate aggregate query answers over such hidden web databases by issuing a small number of carefully designed search queries through the restrictive web interface. A problem with these existing work, however, is that they all assume the underlying database to be static, while most realworld web databases (e.g., Amazon, eBay) are frequently updated. In this paper, we study the novel problem of estimating/tracking aggregates over dynamic hidden web databases while adhering to the stringent query-cost limitation they enforce (e.g., at most 1,000 search queries per day). Theoretical analysis and extensive realworld experiments demonstrate the effectiveness of our proposed algorithms and their superiority over baseline solutions (e.g., the repeated execution of algorithms designed for static web databases).
UR - http://www.scopus.com/inward/record.url?scp=84905106850&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84905106850&partnerID=8YFLogxK
U2 - 10.14778/2732977.2732985
DO - 10.14778/2732977.2732985
M3 - Article
AN - SCOPUS:84905106850
VL - 7
SP - 1107
EP - 1118
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
SN - 2150-8097
IS - 12
ER -