Processing top-k join queries

Minji Wu, Laure Berti-Equille, Amélie Marian, Cecilia M. Procopiuc, Divesh Srivastava

Research output: Chapter in Book/Report/Conference proceedingChapter

12 Citations (Scopus)

Abstract

We consider the problem of efficiently finding the top-k answers for join queries over web-accessible databases. Classical algorithms for finding top-k answers use branch-and-bound techniques to avoid computing scores of all candidates in identifying the top-k answers. To be able to apply such techniques, it is critical to efficiently compute (lower and upper) bounds and expected scores of candidate answers in an incremental fashion during the evaluation. In this paper, we describe novel techniques for these problems. The first contribution of this paper is a method to efficiently compute bounds for the score of a query result when tuples in tables from the "FROM" clause are discovered incrementally, through either sorted or random access. Our second contribution is an algorithm that, given a set of partially evaluated candidate answers, determines a good order in which to access the tables to minimize wasted efforts in the computation of top-k answers. We evaluate our algorithms on a variety of queries and data sets and demonstrate the significant benefits they provide.

Original languageEnglish
Title of host publicationProceedings of the VLDB Endowment
Pages860-870
Number of pages11
Volume3
Edition1
Publication statusPublished - Sep 2010
Externally publishedYes

Fingerprint

Processing

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Wu, M., Berti-Equille, L., Marian, A., Procopiuc, C. M., & Srivastava, D. (2010). Processing top-k join queries. In Proceedings of the VLDB Endowment (1 ed., Vol. 3, pp. 860-870)

Processing top-k join queries. / Wu, Minji; Berti-Equille, Laure; Marian, Amélie; Procopiuc, Cecilia M.; Srivastava, Divesh.

Proceedings of the VLDB Endowment. Vol. 3 1. ed. 2010. p. 860-870.

Research output: Chapter in Book/Report/Conference proceedingChapter

Wu, M, Berti-Equille, L, Marian, A, Procopiuc, CM & Srivastava, D 2010, Processing top-k join queries. in Proceedings of the VLDB Endowment. 1 edn, vol. 3, pp. 860-870.
Wu M, Berti-Equille L, Marian A, Procopiuc CM, Srivastava D. Processing top-k join queries. In Proceedings of the VLDB Endowment. 1 ed. Vol. 3. 2010. p. 860-870
Wu, Minji ; Berti-Equille, Laure ; Marian, Amélie ; Procopiuc, Cecilia M. ; Srivastava, Divesh. / Processing top-k join queries. Proceedings of the VLDB Endowment. Vol. 3 1. ed. 2010. pp. 860-870
@inbook{aa5f3abe04954e508527249d4982892d,
title = "Processing top-k join queries",
abstract = "We consider the problem of efficiently finding the top-k answers for join queries over web-accessible databases. Classical algorithms for finding top-k answers use branch-and-bound techniques to avoid computing scores of all candidates in identifying the top-k answers. To be able to apply such techniques, it is critical to efficiently compute (lower and upper) bounds and expected scores of candidate answers in an incremental fashion during the evaluation. In this paper, we describe novel techniques for these problems. The first contribution of this paper is a method to efficiently compute bounds for the score of a query result when tuples in tables from the {"}FROM{"} clause are discovered incrementally, through either sorted or random access. Our second contribution is an algorithm that, given a set of partially evaluated candidate answers, determines a good order in which to access the tables to minimize wasted efforts in the computation of top-k answers. We evaluate our algorithms on a variety of queries and data sets and demonstrate the significant benefits they provide.",
author = "Minji Wu and Laure Berti-Equille and Am{\'e}lie Marian and Procopiuc, {Cecilia M.} and Divesh Srivastava",
year = "2010",
month = "9",
language = "English",
volume = "3",
pages = "860--870",
booktitle = "Proceedings of the VLDB Endowment",
edition = "1",

}

TY - CHAP

T1 - Processing top-k join queries

AU - Wu, Minji

AU - Berti-Equille, Laure

AU - Marian, Amélie

AU - Procopiuc, Cecilia M.

AU - Srivastava, Divesh

PY - 2010/9

Y1 - 2010/9

N2 - We consider the problem of efficiently finding the top-k answers for join queries over web-accessible databases. Classical algorithms for finding top-k answers use branch-and-bound techniques to avoid computing scores of all candidates in identifying the top-k answers. To be able to apply such techniques, it is critical to efficiently compute (lower and upper) bounds and expected scores of candidate answers in an incremental fashion during the evaluation. In this paper, we describe novel techniques for these problems. The first contribution of this paper is a method to efficiently compute bounds for the score of a query result when tuples in tables from the "FROM" clause are discovered incrementally, through either sorted or random access. Our second contribution is an algorithm that, given a set of partially evaluated candidate answers, determines a good order in which to access the tables to minimize wasted efforts in the computation of top-k answers. We evaluate our algorithms on a variety of queries and data sets and demonstrate the significant benefits they provide.

AB - We consider the problem of efficiently finding the top-k answers for join queries over web-accessible databases. Classical algorithms for finding top-k answers use branch-and-bound techniques to avoid computing scores of all candidates in identifying the top-k answers. To be able to apply such techniques, it is critical to efficiently compute (lower and upper) bounds and expected scores of candidate answers in an incremental fashion during the evaluation. In this paper, we describe novel techniques for these problems. The first contribution of this paper is a method to efficiently compute bounds for the score of a query result when tuples in tables from the "FROM" clause are discovered incrementally, through either sorted or random access. Our second contribution is an algorithm that, given a set of partially evaluated candidate answers, determines a good order in which to access the tables to minimize wasted efforts in the computation of top-k answers. We evaluate our algorithms on a variety of queries and data sets and demonstrate the significant benefits they provide.

UR - http://www.scopus.com/inward/record.url?scp=84859262820&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84859262820&partnerID=8YFLogxK

M3 - Chapter

VL - 3

SP - 860

EP - 870

BT - Proceedings of the VLDB Endowment

ER -