Rank discovery from web databases

Saravanan Thirumuruganathan, Nan Zhang, Gautam Das

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Many web databases are only accessible through a proprietary search interface which allows users to form a query by entering the desired values for a few attributes. After receiving a query, the system returns the top-k matching tuples according to a pre-determined ranking function. Since the rank of a tuple largely determines the attention it receives from website users, ranking information for any tuple - not just the top-ranked ones - is often of significant interest to third parties such as sellers, customers, market researchers and investors. In this paper, we define a novel problem of rank discovery over hidden web databases. We introduce a taxonomy of ranking functions, and show that different types of ranking functions require fundamentally different approaches for rank discovery. Our technical contributions include principled and efficient randomized algorithms for estimating the rank of a given tuple, as well as negative results which demonstrate the inefficiency of any deterministic algorithm. We show extensive experimental results over real-world databases, including an online experiment at Amazon.com, which illustrates the effectiveness of our proposed techniques.

Original languageEnglish
Pages (from-to)1582-1593
Number of pages12
JournalProceedings of the VLDB Endowment
Volume6
Issue number13
DOIs
Publication statusPublished - 1 Jan 2013
Externally publishedYes

Fingerprint

Taxonomies
Websites
Experiments

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Rank discovery from web databases. / Thirumuruganathan, Saravanan; Zhang, Nan; Das, Gautam.

In: Proceedings of the VLDB Endowment, Vol. 6, No. 13, 01.01.2013, p. 1582-1593.

Research output: Contribution to journalArticle

Thirumuruganathan, Saravanan ; Zhang, Nan ; Das, Gautam. / Rank discovery from web databases. In: Proceedings of the VLDB Endowment. 2013 ; Vol. 6, No. 13. pp. 1582-1593.
@article{8e5d71c8598f4bdfb0ba4ce8e1c342a7,
title = "Rank discovery from web databases",
abstract = "Many web databases are only accessible through a proprietary search interface which allows users to form a query by entering the desired values for a few attributes. After receiving a query, the system returns the top-k matching tuples according to a pre-determined ranking function. Since the rank of a tuple largely determines the attention it receives from website users, ranking information for any tuple - not just the top-ranked ones - is often of significant interest to third parties such as sellers, customers, market researchers and investors. In this paper, we define a novel problem of rank discovery over hidden web databases. We introduce a taxonomy of ranking functions, and show that different types of ranking functions require fundamentally different approaches for rank discovery. Our technical contributions include principled and efficient randomized algorithms for estimating the rank of a given tuple, as well as negative results which demonstrate the inefficiency of any deterministic algorithm. We show extensive experimental results over real-world databases, including an online experiment at Amazon.com, which illustrates the effectiveness of our proposed techniques.",
author = "Saravanan Thirumuruganathan and Nan Zhang and Gautam Das",
year = "2013",
month = "1",
day = "1",
doi = "10.14778/2536258.2536269",
language = "English",
volume = "6",
pages = "1582--1593",
journal = "Proceedings of the VLDB Endowment",
issn = "2150-8097",
publisher = "Very Large Data Base Endowment Inc.",
number = "13",

}

TY - JOUR

T1 - Rank discovery from web databases

AU - Thirumuruganathan, Saravanan

AU - Zhang, Nan

AU - Das, Gautam

PY - 2013/1/1

Y1 - 2013/1/1

N2 - Many web databases are only accessible through a proprietary search interface which allows users to form a query by entering the desired values for a few attributes. After receiving a query, the system returns the top-k matching tuples according to a pre-determined ranking function. Since the rank of a tuple largely determines the attention it receives from website users, ranking information for any tuple - not just the top-ranked ones - is often of significant interest to third parties such as sellers, customers, market researchers and investors. In this paper, we define a novel problem of rank discovery over hidden web databases. We introduce a taxonomy of ranking functions, and show that different types of ranking functions require fundamentally different approaches for rank discovery. Our technical contributions include principled and efficient randomized algorithms for estimating the rank of a given tuple, as well as negative results which demonstrate the inefficiency of any deterministic algorithm. We show extensive experimental results over real-world databases, including an online experiment at Amazon.com, which illustrates the effectiveness of our proposed techniques.

AB - Many web databases are only accessible through a proprietary search interface which allows users to form a query by entering the desired values for a few attributes. After receiving a query, the system returns the top-k matching tuples according to a pre-determined ranking function. Since the rank of a tuple largely determines the attention it receives from website users, ranking information for any tuple - not just the top-ranked ones - is often of significant interest to third parties such as sellers, customers, market researchers and investors. In this paper, we define a novel problem of rank discovery over hidden web databases. We introduce a taxonomy of ranking functions, and show that different types of ranking functions require fundamentally different approaches for rank discovery. Our technical contributions include principled and efficient randomized algorithms for estimating the rank of a given tuple, as well as negative results which demonstrate the inefficiency of any deterministic algorithm. We show extensive experimental results over real-world databases, including an online experiment at Amazon.com, which illustrates the effectiveness of our proposed techniques.

UR - http://www.scopus.com/inward/record.url?scp=84891101312&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84891101312&partnerID=8YFLogxK

U2 - 10.14778/2536258.2536269

DO - 10.14778/2536258.2536269

M3 - Article

VL - 6

SP - 1582

EP - 1593

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

SN - 2150-8097

IS - 13

ER -