Rank discovery from web databases

Saravanan Thirumuruganathan, Nan Zhang, Gautam Das

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Many web databases are only accessible through a proprietary search interface which allows users to form a query by entering the desired values for a few attributes. After receiving a query, the system returns the top-k matching tuples according to a pre-determined ranking function. Since the rank of a tuple largely determines the attention it receives from website users, ranking information for any tuple - not just the top-ranked ones - is often of significant interest to third parties such as sellers, customers, market researchers and investors. In this paper, we define a novel problem of rank discovery over hidden web databases. We introduce a taxonomy of ranking functions, and show that different types of ranking functions require fundamentally different approaches for rank discovery. Our technical contributions include principled and efficient randomized algorithms for estimating the rank of a given tuple, as well as negative results which demonstrate the inefficiency of any deterministic algorithm. We show extensive experimental results over real-world databases, including an online experiment at Amazon.com, which illustrates the effectiveness of our proposed techniques.

Original languageEnglish
Pages (from-to)1582-1593
Number of pages12
JournalProceedings of the VLDB Endowment
Volume6
Issue number13
DOIs
Publication statusPublished - Aug 2013

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Rank discovery from web databases'. Together they form a unique fingerprint.

  • Cite this