DRS: Auto-Scaling for Real-Time Stream Analytics

Tom Z.J. Fu, Jianbing Ding, Richard T.B. Ma, Marianne Winslett, Yin Yang, Zhenjie Zhang

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

In a stream data analytics system, input data arrive continuously and trigger the processing and updating of analytics results. We focus on applications with real-time constraints, in which, any data unit must be completely processed within a given time duration. To handle fast data, it is common to place the stream data analytics system on top of a cloud infrastructure. Because stream properties, such as arrival rates can fluctuate unpredictably, cloud resources must be dynamically provisioned and scheduled accordingly to ensure real-time responses. It is essential, for existing systems or future developments, to possess the ability of scaling resources dynamically according to the instantaneous workload, in order to avoid wasting resources or failing in delivering the correct analytics results on time. Motivated by this, we propose DRS, a dynamic resource scaling framework for cloud-based stream data analytics systems. DRS overcomes three fundamental challenges: 1 how to model the relationship between the provisioned resources and the application performance, 2 where to best place resources, and 3 how to measure the system load with minimal overhead. In particular, DRS includes an accurate performance model based on the theory of Jackson open queueing networks and is capable of handling arbitrary operator topologies, possibly with loops, splits, and joins. Extensive experiments with real data show that DRS is capable of detecting sub-optimal resource allocation and making quick and effective resource adjustment.

Original languageEnglish
Pages (from-to)3338-3352
Number of pages15
JournalIEEE/ACM Transactions on Networking
Volume25
Issue number6
DOIs
Publication statusPublished - 1 Dec 2017

Fingerprint

Queueing networks
Resource allocation
Mathematical operators
Topology
Processing
Experiments

Keywords

  • queueing network model
  • resource auto-scaling
  • stream data analytics
  • Termsa-Cloud computing

ASJC Scopus subject areas

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Cite this

Fu, T. Z. J., Ding, J., Ma, R. T. B., Winslett, M., Yang, Y., & Zhang, Z. (2017). DRS: Auto-Scaling for Real-Time Stream Analytics. IEEE/ACM Transactions on Networking, 25(6), 3338-3352. https://doi.org/10.1109/TNET.2017.2741969

DRS : Auto-Scaling for Real-Time Stream Analytics. / Fu, Tom Z.J.; Ding, Jianbing; Ma, Richard T.B.; Winslett, Marianne; Yang, Yin; Zhang, Zhenjie.

In: IEEE/ACM Transactions on Networking, Vol. 25, No. 6, 01.12.2017, p. 3338-3352.

Research output: Contribution to journalArticle

Fu, TZJ, Ding, J, Ma, RTB, Winslett, M, Yang, Y & Zhang, Z 2017, 'DRS: Auto-Scaling for Real-Time Stream Analytics', IEEE/ACM Transactions on Networking, vol. 25, no. 6, pp. 3338-3352. https://doi.org/10.1109/TNET.2017.2741969
Fu, Tom Z.J. ; Ding, Jianbing ; Ma, Richard T.B. ; Winslett, Marianne ; Yang, Yin ; Zhang, Zhenjie. / DRS : Auto-Scaling for Real-Time Stream Analytics. In: IEEE/ACM Transactions on Networking. 2017 ; Vol. 25, No. 6. pp. 3338-3352.
@article{22614b531d6e4e7dae47321d8efa1729,
title = "DRS: Auto-Scaling for Real-Time Stream Analytics",
abstract = "In a stream data analytics system, input data arrive continuously and trigger the processing and updating of analytics results. We focus on applications with real-time constraints, in which, any data unit must be completely processed within a given time duration. To handle fast data, it is common to place the stream data analytics system on top of a cloud infrastructure. Because stream properties, such as arrival rates can fluctuate unpredictably, cloud resources must be dynamically provisioned and scheduled accordingly to ensure real-time responses. It is essential, for existing systems or future developments, to possess the ability of scaling resources dynamically according to the instantaneous workload, in order to avoid wasting resources or failing in delivering the correct analytics results on time. Motivated by this, we propose DRS, a dynamic resource scaling framework for cloud-based stream data analytics systems. DRS overcomes three fundamental challenges: 1 how to model the relationship between the provisioned resources and the application performance, 2 where to best place resources, and 3 how to measure the system load with minimal overhead. In particular, DRS includes an accurate performance model based on the theory of Jackson open queueing networks and is capable of handling arbitrary operator topologies, possibly with loops, splits, and joins. Extensive experiments with real data show that DRS is capable of detecting sub-optimal resource allocation and making quick and effective resource adjustment.",
keywords = "queueing network model, resource auto-scaling, stream data analytics, Termsa-Cloud computing",
author = "Fu, {Tom Z.J.} and Jianbing Ding and Ma, {Richard T.B.} and Marianne Winslett and Yin Yang and Zhenjie Zhang",
year = "2017",
month = "12",
day = "1",
doi = "10.1109/TNET.2017.2741969",
language = "English",
volume = "25",
pages = "3338--3352",
journal = "IEEE/ACM Transactions on Networking",
issn = "1063-6692",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "6",

}

TY - JOUR

T1 - DRS

T2 - Auto-Scaling for Real-Time Stream Analytics

AU - Fu, Tom Z.J.

AU - Ding, Jianbing

AU - Ma, Richard T.B.

AU - Winslett, Marianne

AU - Yang, Yin

AU - Zhang, Zhenjie

PY - 2017/12/1

Y1 - 2017/12/1

N2 - In a stream data analytics system, input data arrive continuously and trigger the processing and updating of analytics results. We focus on applications with real-time constraints, in which, any data unit must be completely processed within a given time duration. To handle fast data, it is common to place the stream data analytics system on top of a cloud infrastructure. Because stream properties, such as arrival rates can fluctuate unpredictably, cloud resources must be dynamically provisioned and scheduled accordingly to ensure real-time responses. It is essential, for existing systems or future developments, to possess the ability of scaling resources dynamically according to the instantaneous workload, in order to avoid wasting resources or failing in delivering the correct analytics results on time. Motivated by this, we propose DRS, a dynamic resource scaling framework for cloud-based stream data analytics systems. DRS overcomes three fundamental challenges: 1 how to model the relationship between the provisioned resources and the application performance, 2 where to best place resources, and 3 how to measure the system load with minimal overhead. In particular, DRS includes an accurate performance model based on the theory of Jackson open queueing networks and is capable of handling arbitrary operator topologies, possibly with loops, splits, and joins. Extensive experiments with real data show that DRS is capable of detecting sub-optimal resource allocation and making quick and effective resource adjustment.

AB - In a stream data analytics system, input data arrive continuously and trigger the processing and updating of analytics results. We focus on applications with real-time constraints, in which, any data unit must be completely processed within a given time duration. To handle fast data, it is common to place the stream data analytics system on top of a cloud infrastructure. Because stream properties, such as arrival rates can fluctuate unpredictably, cloud resources must be dynamically provisioned and scheduled accordingly to ensure real-time responses. It is essential, for existing systems or future developments, to possess the ability of scaling resources dynamically according to the instantaneous workload, in order to avoid wasting resources or failing in delivering the correct analytics results on time. Motivated by this, we propose DRS, a dynamic resource scaling framework for cloud-based stream data analytics systems. DRS overcomes three fundamental challenges: 1 how to model the relationship between the provisioned resources and the application performance, 2 where to best place resources, and 3 how to measure the system load with minimal overhead. In particular, DRS includes an accurate performance model based on the theory of Jackson open queueing networks and is capable of handling arbitrary operator topologies, possibly with loops, splits, and joins. Extensive experiments with real data show that DRS is capable of detecting sub-optimal resource allocation and making quick and effective resource adjustment.

KW - queueing network model

KW - resource auto-scaling

KW - stream data analytics

KW - Termsa-Cloud computing

UR - http://www.scopus.com/inward/record.url?scp=85029168732&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85029168732&partnerID=8YFLogxK

U2 - 10.1109/TNET.2017.2741969

DO - 10.1109/TNET.2017.2741969

M3 - Article

AN - SCOPUS:85029168732

VL - 25

SP - 3338

EP - 3352

JO - IEEE/ACM Transactions on Networking

JF - IEEE/ACM Transactions on Networking

SN - 1063-6692

IS - 6

ER -