Real-time failure prediction in online services

Mohammed Shatnawi, Mohamed Hefeeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

Current data mining techniques used to create failure predictors for online services require massive amounts of data to build, train, and test the predictors. These operations are tedious, time consuming, and are not done in real-time. Also, the accuracy of the resulting predictor is highly compromised by changes that affect the environment and working conditions of the predictor. We propose a new approach to creating a dynamic failure predictor for online services in real-time and keeping its accuracy high during the services run-time changes. We use synthetic transactions during the run-time lifecycle to generate current data about the service. This data is used in its ephemeral state to build, train, test, and maintain an up-to-date failure predictor. We implemented the proposed approach in a large-scale online ad service that processes billions of requests each month in six data centers distributed in three continents. We show that the proposed predictor is able to maintain failure prediction accuracy as high as 86% during online service changes, whereas the accuracy of the state-of-the-art predictors may drop to less than 10%.

Original languageEnglish
Title of host publicationProceedings - IEEE INFOCOM
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1391-1399
Number of pages9
Volume26
ISBN (Print)9781479983810
DOIs
Publication statusPublished - 21 Aug 2015
Event34th IEEE Annual Conference on Computer Communications and Networks, IEEE INFOCOM 2015 - Hong Kong, Hong Kong
Duration: 26 Apr 20151 May 2015

Other

Other34th IEEE Annual Conference on Computer Communications and Networks, IEEE INFOCOM 2015
CountryHong Kong
CityHong Kong
Period26/4/151/5/15

Fingerprint

Data mining

ASJC Scopus subject areas

  • Computer Science(all)
  • Electrical and Electronic Engineering

Cite this

Shatnawi, M., & Hefeeda, M. (2015). Real-time failure prediction in online services. In Proceedings - IEEE INFOCOM (Vol. 26, pp. 1391-1399). [7218516] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/INFOCOM.2015.7218516

Real-time failure prediction in online services. / Shatnawi, Mohammed; Hefeeda, Mohamed.

Proceedings - IEEE INFOCOM. Vol. 26 Institute of Electrical and Electronics Engineers Inc., 2015. p. 1391-1399 7218516.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shatnawi, M & Hefeeda, M 2015, Real-time failure prediction in online services. in Proceedings - IEEE INFOCOM. vol. 26, 7218516, Institute of Electrical and Electronics Engineers Inc., pp. 1391-1399, 34th IEEE Annual Conference on Computer Communications and Networks, IEEE INFOCOM 2015, Hong Kong, Hong Kong, 26/4/15. https://doi.org/10.1109/INFOCOM.2015.7218516
Shatnawi M, Hefeeda M. Real-time failure prediction in online services. In Proceedings - IEEE INFOCOM. Vol. 26. Institute of Electrical and Electronics Engineers Inc. 2015. p. 1391-1399. 7218516 https://doi.org/10.1109/INFOCOM.2015.7218516
Shatnawi, Mohammed ; Hefeeda, Mohamed. / Real-time failure prediction in online services. Proceedings - IEEE INFOCOM. Vol. 26 Institute of Electrical and Electronics Engineers Inc., 2015. pp. 1391-1399
@inproceedings{86cf7f141114468fb319ba08c7ff2407,
title = "Real-time failure prediction in online services",
abstract = "Current data mining techniques used to create failure predictors for online services require massive amounts of data to build, train, and test the predictors. These operations are tedious, time consuming, and are not done in real-time. Also, the accuracy of the resulting predictor is highly compromised by changes that affect the environment and working conditions of the predictor. We propose a new approach to creating a dynamic failure predictor for online services in real-time and keeping its accuracy high during the services run-time changes. We use synthetic transactions during the run-time lifecycle to generate current data about the service. This data is used in its ephemeral state to build, train, test, and maintain an up-to-date failure predictor. We implemented the proposed approach in a large-scale online ad service that processes billions of requests each month in six data centers distributed in three continents. We show that the proposed predictor is able to maintain failure prediction accuracy as high as 86{\%} during online service changes, whereas the accuracy of the state-of-the-art predictors may drop to less than 10{\%}.",
author = "Mohammed Shatnawi and Mohamed Hefeeda",
year = "2015",
month = "8",
day = "21",
doi = "10.1109/INFOCOM.2015.7218516",
language = "English",
isbn = "9781479983810",
volume = "26",
pages = "1391--1399",
booktitle = "Proceedings - IEEE INFOCOM",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Real-time failure prediction in online services

AU - Shatnawi, Mohammed

AU - Hefeeda, Mohamed

PY - 2015/8/21

Y1 - 2015/8/21

N2 - Current data mining techniques used to create failure predictors for online services require massive amounts of data to build, train, and test the predictors. These operations are tedious, time consuming, and are not done in real-time. Also, the accuracy of the resulting predictor is highly compromised by changes that affect the environment and working conditions of the predictor. We propose a new approach to creating a dynamic failure predictor for online services in real-time and keeping its accuracy high during the services run-time changes. We use synthetic transactions during the run-time lifecycle to generate current data about the service. This data is used in its ephemeral state to build, train, test, and maintain an up-to-date failure predictor. We implemented the proposed approach in a large-scale online ad service that processes billions of requests each month in six data centers distributed in three continents. We show that the proposed predictor is able to maintain failure prediction accuracy as high as 86% during online service changes, whereas the accuracy of the state-of-the-art predictors may drop to less than 10%.

AB - Current data mining techniques used to create failure predictors for online services require massive amounts of data to build, train, and test the predictors. These operations are tedious, time consuming, and are not done in real-time. Also, the accuracy of the resulting predictor is highly compromised by changes that affect the environment and working conditions of the predictor. We propose a new approach to creating a dynamic failure predictor for online services in real-time and keeping its accuracy high during the services run-time changes. We use synthetic transactions during the run-time lifecycle to generate current data about the service. This data is used in its ephemeral state to build, train, test, and maintain an up-to-date failure predictor. We implemented the proposed approach in a large-scale online ad service that processes billions of requests each month in six data centers distributed in three continents. We show that the proposed predictor is able to maintain failure prediction accuracy as high as 86% during online service changes, whereas the accuracy of the state-of-the-art predictors may drop to less than 10%.

UR - http://www.scopus.com/inward/record.url?scp=84954220085&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84954220085&partnerID=8YFLogxK

U2 - 10.1109/INFOCOM.2015.7218516

DO - 10.1109/INFOCOM.2015.7218516

M3 - Conference contribution

SN - 9781479983810

VL - 26

SP - 1391

EP - 1399

BT - Proceedings - IEEE INFOCOM

PB - Institute of Electrical and Electronics Engineers Inc.

ER -