Elastic pipelining in an in-memory database cluster

Li Wang, Minqi Zhou, Zhenjie Zhang, Yin Yang, Aoying Zhou, Dina Bitton

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

An in-memory database cluster consists of multiple interconnected nodes with a large capacity of RAM and modern multi-core CPUs. As a conventional query processing strategy, pipelining remains a promising solution for in-memory parallel database systems, as it avoids expensive intermediate result materialization and parallelizes the data processing among nodes. However, to fully unleash the power of pipelining in a cluster with multi-core nodes, it is crucial for the query optimizer to generate good query plans with appropriate intra-node parallelism, in order to maximize CPU and network bandwidth utilization. A suboptimal plan, on the contrary, causes load imbalance in the pipelines and consequently degrades the query performance. Parallelism assignment optimization at compile time is nearly impossible, as the workload in each node is affected by numerous factors and is highly dynamic during query evaluation. To tackle this problem, we propose elastic pipelining, which makes it possible to optimize intra-node parallelism assignments in the pipelines based on the actual workload at runtime. It is achieved with the adoption of new elastic iterator model and a fully optimized dynamic scheduler. The elastic iterator model generally upgrades traditional iterator model with new dynamic multi-core execution adjustment capability. And the dynamic scheduler efficiently provisions CPU cores to query execution segments in the pipelines based on the light-weight measurements on the operators. Extensive experiments on real and synthetic (TPC-H) data show that our proposal achieves almost full CPU utilization on typical decision-making analytical queries, outperforming state-of-the-art open-source systems by a huge margin.

Original languageEnglish
Title of host publicationSIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages1279-1294
Number of pages16
Volume26-June-2016
ISBN (Electronic)9781450335317
DOIs
Publication statusPublished - 26 Jun 2016
Externally publishedYes
Event2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016 - San Francisco, United States
Duration: 26 Jun 20161 Jul 2016

Other

Other2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016
CountryUnited States
CitySan Francisco
Period26/6/161/7/16

Fingerprint

Program processors
Data storage equipment
Pipelines
Light measurement
Query processing
Random access storage
Weighing
Decision making
Bandwidth
Experiments

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Wang, L., Zhou, M., Zhang, Z., Yang, Y., Zhou, A., & Bitton, D. (2016). Elastic pipelining in an in-memory database cluster. In SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data (Vol. 26-June-2016, pp. 1279-1294). Association for Computing Machinery. https://doi.org/10.1145/2882903.2882904

Elastic pipelining in an in-memory database cluster. / Wang, Li; Zhou, Minqi; Zhang, Zhenjie; Yang, Yin; Zhou, Aoying; Bitton, Dina.

SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data. Vol. 26-June-2016 Association for Computing Machinery, 2016. p. 1279-1294.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, L, Zhou, M, Zhang, Z, Yang, Y, Zhou, A & Bitton, D 2016, Elastic pipelining in an in-memory database cluster. in SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data. vol. 26-June-2016, Association for Computing Machinery, pp. 1279-1294, 2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016, San Francisco, United States, 26/6/16. https://doi.org/10.1145/2882903.2882904
Wang L, Zhou M, Zhang Z, Yang Y, Zhou A, Bitton D. Elastic pipelining in an in-memory database cluster. In SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data. Vol. 26-June-2016. Association for Computing Machinery. 2016. p. 1279-1294 https://doi.org/10.1145/2882903.2882904
Wang, Li ; Zhou, Minqi ; Zhang, Zhenjie ; Yang, Yin ; Zhou, Aoying ; Bitton, Dina. / Elastic pipelining in an in-memory database cluster. SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data. Vol. 26-June-2016 Association for Computing Machinery, 2016. pp. 1279-1294
@inproceedings{763ef237246a466fa867d3225f0886d4,
title = "Elastic pipelining in an in-memory database cluster",
abstract = "An in-memory database cluster consists of multiple interconnected nodes with a large capacity of RAM and modern multi-core CPUs. As a conventional query processing strategy, pipelining remains a promising solution for in-memory parallel database systems, as it avoids expensive intermediate result materialization and parallelizes the data processing among nodes. However, to fully unleash the power of pipelining in a cluster with multi-core nodes, it is crucial for the query optimizer to generate good query plans with appropriate intra-node parallelism, in order to maximize CPU and network bandwidth utilization. A suboptimal plan, on the contrary, causes load imbalance in the pipelines and consequently degrades the query performance. Parallelism assignment optimization at compile time is nearly impossible, as the workload in each node is affected by numerous factors and is highly dynamic during query evaluation. To tackle this problem, we propose elastic pipelining, which makes it possible to optimize intra-node parallelism assignments in the pipelines based on the actual workload at runtime. It is achieved with the adoption of new elastic iterator model and a fully optimized dynamic scheduler. The elastic iterator model generally upgrades traditional iterator model with new dynamic multi-core execution adjustment capability. And the dynamic scheduler efficiently provisions CPU cores to query execution segments in the pipelines based on the light-weight measurements on the operators. Extensive experiments on real and synthetic (TPC-H) data show that our proposal achieves almost full CPU utilization on typical decision-making analytical queries, outperforming state-of-the-art open-source systems by a huge margin.",
author = "Li Wang and Minqi Zhou and Zhenjie Zhang and Yin Yang and Aoying Zhou and Dina Bitton",
year = "2016",
month = "6",
day = "26",
doi = "10.1145/2882903.2882904",
language = "English",
volume = "26-June-2016",
pages = "1279--1294",
booktitle = "SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - Elastic pipelining in an in-memory database cluster

AU - Wang, Li

AU - Zhou, Minqi

AU - Zhang, Zhenjie

AU - Yang, Yin

AU - Zhou, Aoying

AU - Bitton, Dina

PY - 2016/6/26

Y1 - 2016/6/26

N2 - An in-memory database cluster consists of multiple interconnected nodes with a large capacity of RAM and modern multi-core CPUs. As a conventional query processing strategy, pipelining remains a promising solution for in-memory parallel database systems, as it avoids expensive intermediate result materialization and parallelizes the data processing among nodes. However, to fully unleash the power of pipelining in a cluster with multi-core nodes, it is crucial for the query optimizer to generate good query plans with appropriate intra-node parallelism, in order to maximize CPU and network bandwidth utilization. A suboptimal plan, on the contrary, causes load imbalance in the pipelines and consequently degrades the query performance. Parallelism assignment optimization at compile time is nearly impossible, as the workload in each node is affected by numerous factors and is highly dynamic during query evaluation. To tackle this problem, we propose elastic pipelining, which makes it possible to optimize intra-node parallelism assignments in the pipelines based on the actual workload at runtime. It is achieved with the adoption of new elastic iterator model and a fully optimized dynamic scheduler. The elastic iterator model generally upgrades traditional iterator model with new dynamic multi-core execution adjustment capability. And the dynamic scheduler efficiently provisions CPU cores to query execution segments in the pipelines based on the light-weight measurements on the operators. Extensive experiments on real and synthetic (TPC-H) data show that our proposal achieves almost full CPU utilization on typical decision-making analytical queries, outperforming state-of-the-art open-source systems by a huge margin.

AB - An in-memory database cluster consists of multiple interconnected nodes with a large capacity of RAM and modern multi-core CPUs. As a conventional query processing strategy, pipelining remains a promising solution for in-memory parallel database systems, as it avoids expensive intermediate result materialization and parallelizes the data processing among nodes. However, to fully unleash the power of pipelining in a cluster with multi-core nodes, it is crucial for the query optimizer to generate good query plans with appropriate intra-node parallelism, in order to maximize CPU and network bandwidth utilization. A suboptimal plan, on the contrary, causes load imbalance in the pipelines and consequently degrades the query performance. Parallelism assignment optimization at compile time is nearly impossible, as the workload in each node is affected by numerous factors and is highly dynamic during query evaluation. To tackle this problem, we propose elastic pipelining, which makes it possible to optimize intra-node parallelism assignments in the pipelines based on the actual workload at runtime. It is achieved with the adoption of new elastic iterator model and a fully optimized dynamic scheduler. The elastic iterator model generally upgrades traditional iterator model with new dynamic multi-core execution adjustment capability. And the dynamic scheduler efficiently provisions CPU cores to query execution segments in the pipelines based on the light-weight measurements on the operators. Extensive experiments on real and synthetic (TPC-H) data show that our proposal achieves almost full CPU utilization on typical decision-making analytical queries, outperforming state-of-the-art open-source systems by a huge margin.

UR - http://www.scopus.com/inward/record.url?scp=84979695913&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84979695913&partnerID=8YFLogxK

U2 - 10.1145/2882903.2882904

DO - 10.1145/2882903.2882904

M3 - Conference contribution

AN - SCOPUS:84979695913

VL - 26-June-2016

SP - 1279

EP - 1294

BT - SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data

PB - Association for Computing Machinery

ER -