The power of both choices: Practical load balancing for distributed stream processing engines

Muhammad Anis Uddin Nasir, Gianmarco Morales, David García-Soriano, Nicolas Kourtellis, Marco Serafini

Research output: Chapter in Book/Report/Conference proceedingConference contribution

60 Citations (Scopus)

Abstract

We study the problem of load balancing in distributed stream processing engines, which is exacerbated in the presence of skew. We introduce Partial Key Grouping (PKG), a new stream partitioning scheme that adapts the classical 'power of two choices' to a distributed streaming setting by leveraging two novel techniques: key splitting and local load estimation. In so doing, it achieves better load balancing than key grouping while being more scalable than shuffle grouping. We test PKG on several large datasets, both real-world and synthetic. Compared to standard hashing, PKG reduces the load imbalance by up to several orders of magnitude, and often achieves nearly-perfect load balance. This result translates into an improvement of up to 60% in throughput and up to 45% in latency when deployed on a real Storm cluster.

Original languageEnglish
Title of host publicationProceedings - International Conference on Data Engineering
PublisherIEEE Computer Society
Pages137-148
Number of pages12
Volume2015-May
ISBN (Print)9781479979639
DOIs
Publication statusPublished - 26 May 2015
Event2015 31st IEEE International Conference on Data Engineering, ICDE 2015 - Seoul, Korea, Republic of
Duration: 13 Apr 201517 Apr 2015

Other

Other2015 31st IEEE International Conference on Data Engineering, ICDE 2015
CountryKorea, Republic of
CitySeoul
Period13/4/1517/4/15

Fingerprint

Resource allocation
Engines
Processing
Throughput

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing
  • Software

Cite this

Nasir, M. A. U., Morales, G., García-Soriano, D., Kourtellis, N., & Serafini, M. (2015). The power of both choices: Practical load balancing for distributed stream processing engines. In Proceedings - International Conference on Data Engineering (Vol. 2015-May, pp. 137-148). [7113279] IEEE Computer Society. https://doi.org/10.1109/ICDE.2015.7113279

The power of both choices : Practical load balancing for distributed stream processing engines. / Nasir, Muhammad Anis Uddin; Morales, Gianmarco; García-Soriano, David; Kourtellis, Nicolas; Serafini, Marco.

Proceedings - International Conference on Data Engineering. Vol. 2015-May IEEE Computer Society, 2015. p. 137-148 7113279.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nasir, MAU, Morales, G, García-Soriano, D, Kourtellis, N & Serafini, M 2015, The power of both choices: Practical load balancing for distributed stream processing engines. in Proceedings - International Conference on Data Engineering. vol. 2015-May, 7113279, IEEE Computer Society, pp. 137-148, 2015 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, Korea, Republic of, 13/4/15. https://doi.org/10.1109/ICDE.2015.7113279
Nasir MAU, Morales G, García-Soriano D, Kourtellis N, Serafini M. The power of both choices: Practical load balancing for distributed stream processing engines. In Proceedings - International Conference on Data Engineering. Vol. 2015-May. IEEE Computer Society. 2015. p. 137-148. 7113279 https://doi.org/10.1109/ICDE.2015.7113279
Nasir, Muhammad Anis Uddin ; Morales, Gianmarco ; García-Soriano, David ; Kourtellis, Nicolas ; Serafini, Marco. / The power of both choices : Practical load balancing for distributed stream processing engines. Proceedings - International Conference on Data Engineering. Vol. 2015-May IEEE Computer Society, 2015. pp. 137-148
@inproceedings{884aa7b69cd64ed89999672a079614e9,
title = "The power of both choices: Practical load balancing for distributed stream processing engines",
abstract = "We study the problem of load balancing in distributed stream processing engines, which is exacerbated in the presence of skew. We introduce Partial Key Grouping (PKG), a new stream partitioning scheme that adapts the classical 'power of two choices' to a distributed streaming setting by leveraging two novel techniques: key splitting and local load estimation. In so doing, it achieves better load balancing than key grouping while being more scalable than shuffle grouping. We test PKG on several large datasets, both real-world and synthetic. Compared to standard hashing, PKG reduces the load imbalance by up to several orders of magnitude, and often achieves nearly-perfect load balance. This result translates into an improvement of up to 60{\%} in throughput and up to 45{\%} in latency when deployed on a real Storm cluster.",
author = "Nasir, {Muhammad Anis Uddin} and Gianmarco Morales and David Garc{\'i}a-Soriano and Nicolas Kourtellis and Marco Serafini",
year = "2015",
month = "5",
day = "26",
doi = "10.1109/ICDE.2015.7113279",
language = "English",
isbn = "9781479979639",
volume = "2015-May",
pages = "137--148",
booktitle = "Proceedings - International Conference on Data Engineering",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - The power of both choices

T2 - Practical load balancing for distributed stream processing engines

AU - Nasir, Muhammad Anis Uddin

AU - Morales, Gianmarco

AU - García-Soriano, David

AU - Kourtellis, Nicolas

AU - Serafini, Marco

PY - 2015/5/26

Y1 - 2015/5/26

N2 - We study the problem of load balancing in distributed stream processing engines, which is exacerbated in the presence of skew. We introduce Partial Key Grouping (PKG), a new stream partitioning scheme that adapts the classical 'power of two choices' to a distributed streaming setting by leveraging two novel techniques: key splitting and local load estimation. In so doing, it achieves better load balancing than key grouping while being more scalable than shuffle grouping. We test PKG on several large datasets, both real-world and synthetic. Compared to standard hashing, PKG reduces the load imbalance by up to several orders of magnitude, and often achieves nearly-perfect load balance. This result translates into an improvement of up to 60% in throughput and up to 45% in latency when deployed on a real Storm cluster.

AB - We study the problem of load balancing in distributed stream processing engines, which is exacerbated in the presence of skew. We introduce Partial Key Grouping (PKG), a new stream partitioning scheme that adapts the classical 'power of two choices' to a distributed streaming setting by leveraging two novel techniques: key splitting and local load estimation. In so doing, it achieves better load balancing than key grouping while being more scalable than shuffle grouping. We test PKG on several large datasets, both real-world and synthetic. Compared to standard hashing, PKG reduces the load imbalance by up to several orders of magnitude, and often achieves nearly-perfect load balance. This result translates into an improvement of up to 60% in throughput and up to 45% in latency when deployed on a real Storm cluster.

UR - http://www.scopus.com/inward/record.url?scp=84940858966&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84940858966&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2015.7113279

DO - 10.1109/ICDE.2015.7113279

M3 - Conference contribution

AN - SCOPUS:84940858966

SN - 9781479979639

VL - 2015-May

SP - 137

EP - 148

BT - Proceedings - International Conference on Data Engineering

PB - IEEE Computer Society

ER -