On producing high and early result throughput in multijoin query plans

Justin K. Levandoski, Mohamed E. Khalefa, Mohamed F. Mokbel

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

This paper introduces an efficient framework for producing high and early result throughput in multijoin query plans. While most previous research focuses on optimizing for cases involving a single join operator, this work takes a radical step by addressing query plans with multiple join operators. The proposed framework consists of two main methods, a flush algorithm and operator state manager. The framework assumes a symmetric hash join, a common method for producing early results, when processing incoming data. In this way, our methods can be applied to a group of previous join operators (optimized for single-join queries) when taking part in multijoin query plans. Specifically, our framework can be applied by 1) employing a new flushing policy to write in-memory data to disk, once memory allotment is exhausted, in a way that helps increase the probability of producing early result throughput in multijoin queries, and 2) employing a state manager that adaptively switches operators in the plan between joining in-memory data and disk-resident data in order to positively affect the early result throughput. Extensive experimental results show that the proposed methods outperform the state-of-the-art join operators optimized for both single and multijoin query plans.

Original languageEnglish
Article number5590243
Pages (from-to)1888-1902
Number of pages15
JournalIEEE Transactions on Knowledge and Data Engineering
Volume23
Issue number12
DOIs
Publication statusPublished - 31 Oct 2011
Externally publishedYes

Fingerprint

Throughput
Data storage equipment
Managers
Joining
Mathematical operators
Switches

Keywords

  • Database management
  • query processing
  • systems

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

On producing high and early result throughput in multijoin query plans. / Levandoski, Justin K.; Khalefa, Mohamed E.; Mokbel, Mohamed F.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 23, No. 12, 5590243, 31.10.2011, p. 1888-1902.

Research output: Contribution to journalArticle

@article{9b362a364bcf4783bcebf0c1a39f0ad9,
title = "On producing high and early result throughput in multijoin query plans",
abstract = "This paper introduces an efficient framework for producing high and early result throughput in multijoin query plans. While most previous research focuses on optimizing for cases involving a single join operator, this work takes a radical step by addressing query plans with multiple join operators. The proposed framework consists of two main methods, a flush algorithm and operator state manager. The framework assumes a symmetric hash join, a common method for producing early results, when processing incoming data. In this way, our methods can be applied to a group of previous join operators (optimized for single-join queries) when taking part in multijoin query plans. Specifically, our framework can be applied by 1) employing a new flushing policy to write in-memory data to disk, once memory allotment is exhausted, in a way that helps increase the probability of producing early result throughput in multijoin queries, and 2) employing a state manager that adaptively switches operators in the plan between joining in-memory data and disk-resident data in order to positively affect the early result throughput. Extensive experimental results show that the proposed methods outperform the state-of-the-art join operators optimized for both single and multijoin query plans.",
keywords = "Database management, query processing, systems",
author = "Levandoski, {Justin K.} and Khalefa, {Mohamed E.} and Mokbel, {Mohamed F.}",
year = "2011",
month = "10",
day = "31",
doi = "10.1109/TKDE.2010.182",
language = "English",
volume = "23",
pages = "1888--1902",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "12",

}

TY - JOUR

T1 - On producing high and early result throughput in multijoin query plans

AU - Levandoski, Justin K.

AU - Khalefa, Mohamed E.

AU - Mokbel, Mohamed F.

PY - 2011/10/31

Y1 - 2011/10/31

N2 - This paper introduces an efficient framework for producing high and early result throughput in multijoin query plans. While most previous research focuses on optimizing for cases involving a single join operator, this work takes a radical step by addressing query plans with multiple join operators. The proposed framework consists of two main methods, a flush algorithm and operator state manager. The framework assumes a symmetric hash join, a common method for producing early results, when processing incoming data. In this way, our methods can be applied to a group of previous join operators (optimized for single-join queries) when taking part in multijoin query plans. Specifically, our framework can be applied by 1) employing a new flushing policy to write in-memory data to disk, once memory allotment is exhausted, in a way that helps increase the probability of producing early result throughput in multijoin queries, and 2) employing a state manager that adaptively switches operators in the plan between joining in-memory data and disk-resident data in order to positively affect the early result throughput. Extensive experimental results show that the proposed methods outperform the state-of-the-art join operators optimized for both single and multijoin query plans.

AB - This paper introduces an efficient framework for producing high and early result throughput in multijoin query plans. While most previous research focuses on optimizing for cases involving a single join operator, this work takes a radical step by addressing query plans with multiple join operators. The proposed framework consists of two main methods, a flush algorithm and operator state manager. The framework assumes a symmetric hash join, a common method for producing early results, when processing incoming data. In this way, our methods can be applied to a group of previous join operators (optimized for single-join queries) when taking part in multijoin query plans. Specifically, our framework can be applied by 1) employing a new flushing policy to write in-memory data to disk, once memory allotment is exhausted, in a way that helps increase the probability of producing early result throughput in multijoin queries, and 2) employing a state manager that adaptively switches operators in the plan between joining in-memory data and disk-resident data in order to positively affect the early result throughput. Extensive experimental results show that the proposed methods outperform the state-of-the-art join operators optimized for both single and multijoin query plans.

KW - Database management

KW - query processing

KW - systems

UR - http://www.scopus.com/inward/record.url?scp=80054922637&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80054922637&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2010.182

DO - 10.1109/TKDE.2010.182

M3 - Article

VL - 23

SP - 1888

EP - 1902

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 12

M1 - 5590243

ER -