Supporting top-k join queries in relational databases

Ihab F. Ilyas, Walid G. Aref, Ahmed Elmagarmid

Research output: Contribution to journalArticle

137 Citations (Scopus)

Abstract

Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance.

Original languageEnglish
Pages (from-to)207-221
Number of pages15
JournalVLDB Journal
Volume13
Issue number3
DOIs
Publication statusPublished - 1 Sep 2004
Externally publishedYes

Fingerprint

Mathematical operators
Middleware
Information retrieval
Joining
Data mining
Engines

Keywords

  • Query operators
  • Ranking
  • Top-k queriesrank aggregation

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems

Cite this

Supporting top-k join queries in relational databases. / Ilyas, Ihab F.; Aref, Walid G.; Elmagarmid, Ahmed.

In: VLDB Journal, Vol. 13, No. 3, 01.09.2004, p. 207-221.

Research output: Contribution to journalArticle

Ilyas, Ihab F. ; Aref, Walid G. ; Elmagarmid, Ahmed. / Supporting top-k join queries in relational databases. In: VLDB Journal. 2004 ; Vol. 13, No. 3. pp. 207-221.
@article{c7d5309d2cd6447fad0275cb26ab0298,
title = "Supporting top-k join queries in relational databases",
abstract = "Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance.",
keywords = "Query operators, Ranking, Top-k queriesrank aggregation",
author = "Ilyas, {Ihab F.} and Aref, {Walid G.} and Ahmed Elmagarmid",
year = "2004",
month = "9",
day = "1",
doi = "10.1007/s00778-004-0128-2",
language = "English",
volume = "13",
pages = "207--221",
journal = "VLDB Journal",
issn = "1066-8888",
publisher = "Springer New York",
number = "3",

}

TY - JOUR

T1 - Supporting top-k join queries in relational databases

AU - Ilyas, Ihab F.

AU - Aref, Walid G.

AU - Elmagarmid, Ahmed

PY - 2004/9/1

Y1 - 2004/9/1

N2 - Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance.

AB - Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance.

KW - Query operators

KW - Ranking

KW - Top-k queriesrank aggregation

UR - http://www.scopus.com/inward/record.url?scp=6344287791&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=6344287791&partnerID=8YFLogxK

U2 - 10.1007/s00778-004-0128-2

DO - 10.1007/s00778-004-0128-2

M3 - Article

VL - 13

SP - 207

EP - 221

JO - VLDB Journal

JF - VLDB Journal

SN - 1066-8888

IS - 3

ER -