Discovering the skyline of web databases

Abolfazl Asudeh, Saravanan Thirumuruganathan, Nan Zhangy, Gautam Das

Research output: Contribution to journalConference article

14 Citations (Scopus)

Abstract

Many web databases are "hidden" behind proprietary search interfaces that enforce the top-k output constraint, i.e., each query returns at most k of all matching tuples, preferentially selected and returned according to a proprietary ranking function. In this paper, we initiate research into the novel problem of skyline discovery over top-k hidden web databases. Since skyline tuples provide critical insights into the database and include the top-ranked tuple for every possible ranking function following the monotonic order of attribute values, skyline discovery from a hidden web database can enable a wide variety of innovative third-party applications over one or multiple web databases. Our research in the paper shows that the critical factor affecting the cost of skyline discovery is the type of search interface controls provided by the website. As such, we develop efficient algorithms for three most popular types, i.e., one-ended range, free range and point predicates, and then combine them to support web databases that feature a mixture of these types. Rigorous theoretical analysis and extensive real-world online and offline experiments demonstrate the effectiveness of our proposed techniques and their superiority over baseline solutions.

Original languageEnglish
Pages (from-to)600-611
Number of pages12
JournalProceedings of the VLDB Endowment
Volume9
Issue number7
DOIs
Publication statusPublished - 1 Jan 2016
Externally publishedYes

Fingerprint

Websites
Costs
Experiments

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Discovering the skyline of web databases. / Asudeh, Abolfazl; Thirumuruganathan, Saravanan; Zhangy, Nan; Das, Gautam.

In: Proceedings of the VLDB Endowment, Vol. 9, No. 7, 01.01.2016, p. 600-611.

Research output: Contribution to journalConference article

Asudeh, Abolfazl ; Thirumuruganathan, Saravanan ; Zhangy, Nan ; Das, Gautam. / Discovering the skyline of web databases. In: Proceedings of the VLDB Endowment. 2016 ; Vol. 9, No. 7. pp. 600-611.
@article{21bae59b317e4f0ba9002e4e8f0bf3bb,
title = "Discovering the skyline of web databases",
abstract = "Many web databases are {"}hidden{"} behind proprietary search interfaces that enforce the top-k output constraint, i.e., each query returns at most k of all matching tuples, preferentially selected and returned according to a proprietary ranking function. In this paper, we initiate research into the novel problem of skyline discovery over top-k hidden web databases. Since skyline tuples provide critical insights into the database and include the top-ranked tuple for every possible ranking function following the monotonic order of attribute values, skyline discovery from a hidden web database can enable a wide variety of innovative third-party applications over one or multiple web databases. Our research in the paper shows that the critical factor affecting the cost of skyline discovery is the type of search interface controls provided by the website. As such, we develop efficient algorithms for three most popular types, i.e., one-ended range, free range and point predicates, and then combine them to support web databases that feature a mixture of these types. Rigorous theoretical analysis and extensive real-world online and offline experiments demonstrate the effectiveness of our proposed techniques and their superiority over baseline solutions.",
author = "Abolfazl Asudeh and Saravanan Thirumuruganathan and Nan Zhangy and Gautam Das",
year = "2016",
month = "1",
day = "1",
doi = "10.14778/2904483.2904491",
language = "English",
volume = "9",
pages = "600--611",
journal = "Proceedings of the VLDB Endowment",
issn = "2150-8097",
publisher = "Very Large Data Base Endowment Inc.",
number = "7",

}

TY - JOUR

T1 - Discovering the skyline of web databases

AU - Asudeh, Abolfazl

AU - Thirumuruganathan, Saravanan

AU - Zhangy, Nan

AU - Das, Gautam

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Many web databases are "hidden" behind proprietary search interfaces that enforce the top-k output constraint, i.e., each query returns at most k of all matching tuples, preferentially selected and returned according to a proprietary ranking function. In this paper, we initiate research into the novel problem of skyline discovery over top-k hidden web databases. Since skyline tuples provide critical insights into the database and include the top-ranked tuple for every possible ranking function following the monotonic order of attribute values, skyline discovery from a hidden web database can enable a wide variety of innovative third-party applications over one or multiple web databases. Our research in the paper shows that the critical factor affecting the cost of skyline discovery is the type of search interface controls provided by the website. As such, we develop efficient algorithms for three most popular types, i.e., one-ended range, free range and point predicates, and then combine them to support web databases that feature a mixture of these types. Rigorous theoretical analysis and extensive real-world online and offline experiments demonstrate the effectiveness of our proposed techniques and their superiority over baseline solutions.

AB - Many web databases are "hidden" behind proprietary search interfaces that enforce the top-k output constraint, i.e., each query returns at most k of all matching tuples, preferentially selected and returned according to a proprietary ranking function. In this paper, we initiate research into the novel problem of skyline discovery over top-k hidden web databases. Since skyline tuples provide critical insights into the database and include the top-ranked tuple for every possible ranking function following the monotonic order of attribute values, skyline discovery from a hidden web database can enable a wide variety of innovative third-party applications over one or multiple web databases. Our research in the paper shows that the critical factor affecting the cost of skyline discovery is the type of search interface controls provided by the website. As such, we develop efficient algorithms for three most popular types, i.e., one-ended range, free range and point predicates, and then combine them to support web databases that feature a mixture of these types. Rigorous theoretical analysis and extensive real-world online and offline experiments demonstrate the effectiveness of our proposed techniques and their superiority over baseline solutions.

UR - http://www.scopus.com/inward/record.url?scp=84976594777&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84976594777&partnerID=8YFLogxK

U2 - 10.14778/2904483.2904491

DO - 10.14778/2904483.2904491

M3 - Conference article

VL - 9

SP - 600

EP - 611

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

SN - 2150-8097

IS - 7

ER -