Predicting completion times of batch query workloads using interaction-aware models and simulation

Mumtaz Ahmad, Songyun Duan, Ashraf Aboulnaga, Shivnath Babu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

29 Citations (Scopus)

Abstract

A question that database administrators (DBAs) routinely need to answer is how long a batch query workload will take to complete. This question arises, for example, while planning the execution of different report-generation workloads to fit within available time windows. To answer this question accurately, we need to take into account that the typical workload in a database system consists of mixes of concurrent queries. Interactions among different queries in these mixes need to be modeled, rather than the conventional approach of considering each query separately. This paper presents a new approach for estimating workload completion times that takes the significant impact of query interactions into account. This approach builds performance models using an experiment-driven technique, by sampling the space of possible query mixes and fitting statistical models to the observed performance at these samples. No prior assumptions are made about the internal workings of the database system or the cause of query interactions, making the models robust and portable. We show that a careful choice of sampling and statistical modeling strategies can result in accurate models, and we present a novel interaction-aware workload simulator that uses these models to estimate workload completion times. An experimental evaluation with complex TPC-H queries on IBM DB2 shows that this approach consistently predicts workload completion times with less than 20% error.

Original languageEnglish
Title of host publicationACM International Conference Proceeding Series
Pages449-460
Number of pages12
DOIs
Publication statusPublished - 18 Apr 2011
Externally publishedYes
Event14th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2011 - Uppsala, Sweden
Duration: 22 Mar 201124 Mar 2011

Other

Other14th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2011
CountrySweden
CityUppsala
Period22/3/1124/3/11

Fingerprint

Sampling
Simulators
Planning
Experiments
Statistical Models

Keywords

  • Algorithms
  • Experimentation
  • Performance

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Ahmad, M., Duan, S., Aboulnaga, A., & Babu, S. (2011). Predicting completion times of batch query workloads using interaction-aware models and simulation. In ACM International Conference Proceeding Series (pp. 449-460) https://doi.org/10.1145/1951365.1951419

Predicting completion times of batch query workloads using interaction-aware models and simulation. / Ahmad, Mumtaz; Duan, Songyun; Aboulnaga, Ashraf; Babu, Shivnath.

ACM International Conference Proceeding Series. 2011. p. 449-460.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ahmad, M, Duan, S, Aboulnaga, A & Babu, S 2011, Predicting completion times of batch query workloads using interaction-aware models and simulation. in ACM International Conference Proceeding Series. pp. 449-460, 14th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2011, Uppsala, Sweden, 22/3/11. https://doi.org/10.1145/1951365.1951419
Ahmad, Mumtaz ; Duan, Songyun ; Aboulnaga, Ashraf ; Babu, Shivnath. / Predicting completion times of batch query workloads using interaction-aware models and simulation. ACM International Conference Proceeding Series. 2011. pp. 449-460
@inproceedings{3adae805bd2d46ac8664a1c335e7bba4,
title = "Predicting completion times of batch query workloads using interaction-aware models and simulation",
abstract = "A question that database administrators (DBAs) routinely need to answer is how long a batch query workload will take to complete. This question arises, for example, while planning the execution of different report-generation workloads to fit within available time windows. To answer this question accurately, we need to take into account that the typical workload in a database system consists of mixes of concurrent queries. Interactions among different queries in these mixes need to be modeled, rather than the conventional approach of considering each query separately. This paper presents a new approach for estimating workload completion times that takes the significant impact of query interactions into account. This approach builds performance models using an experiment-driven technique, by sampling the space of possible query mixes and fitting statistical models to the observed performance at these samples. No prior assumptions are made about the internal workings of the database system or the cause of query interactions, making the models robust and portable. We show that a careful choice of sampling and statistical modeling strategies can result in accurate models, and we present a novel interaction-aware workload simulator that uses these models to estimate workload completion times. An experimental evaluation with complex TPC-H queries on IBM DB2 shows that this approach consistently predicts workload completion times with less than 20{\%} error.",
keywords = "Algorithms, Experimentation, Performance",
author = "Mumtaz Ahmad and Songyun Duan and Ashraf Aboulnaga and Shivnath Babu",
year = "2011",
month = "4",
day = "18",
doi = "10.1145/1951365.1951419",
language = "English",
isbn = "9781450305280",
pages = "449--460",
booktitle = "ACM International Conference Proceeding Series",

}

TY - GEN

T1 - Predicting completion times of batch query workloads using interaction-aware models and simulation

AU - Ahmad, Mumtaz

AU - Duan, Songyun

AU - Aboulnaga, Ashraf

AU - Babu, Shivnath

PY - 2011/4/18

Y1 - 2011/4/18

N2 - A question that database administrators (DBAs) routinely need to answer is how long a batch query workload will take to complete. This question arises, for example, while planning the execution of different report-generation workloads to fit within available time windows. To answer this question accurately, we need to take into account that the typical workload in a database system consists of mixes of concurrent queries. Interactions among different queries in these mixes need to be modeled, rather than the conventional approach of considering each query separately. This paper presents a new approach for estimating workload completion times that takes the significant impact of query interactions into account. This approach builds performance models using an experiment-driven technique, by sampling the space of possible query mixes and fitting statistical models to the observed performance at these samples. No prior assumptions are made about the internal workings of the database system or the cause of query interactions, making the models robust and portable. We show that a careful choice of sampling and statistical modeling strategies can result in accurate models, and we present a novel interaction-aware workload simulator that uses these models to estimate workload completion times. An experimental evaluation with complex TPC-H queries on IBM DB2 shows that this approach consistently predicts workload completion times with less than 20% error.

AB - A question that database administrators (DBAs) routinely need to answer is how long a batch query workload will take to complete. This question arises, for example, while planning the execution of different report-generation workloads to fit within available time windows. To answer this question accurately, we need to take into account that the typical workload in a database system consists of mixes of concurrent queries. Interactions among different queries in these mixes need to be modeled, rather than the conventional approach of considering each query separately. This paper presents a new approach for estimating workload completion times that takes the significant impact of query interactions into account. This approach builds performance models using an experiment-driven technique, by sampling the space of possible query mixes and fitting statistical models to the observed performance at these samples. No prior assumptions are made about the internal workings of the database system or the cause of query interactions, making the models robust and portable. We show that a careful choice of sampling and statistical modeling strategies can result in accurate models, and we present a novel interaction-aware workload simulator that uses these models to estimate workload completion times. An experimental evaluation with complex TPC-H queries on IBM DB2 shows that this approach consistently predicts workload completion times with less than 20% error.

KW - Algorithms

KW - Experimentation

KW - Performance

UR - http://www.scopus.com/inward/record.url?scp=79953845488&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79953845488&partnerID=8YFLogxK

U2 - 10.1145/1951365.1951419

DO - 10.1145/1951365.1951419

M3 - Conference contribution

SN - 9781450305280

SP - 449

EP - 460

BT - ACM International Conference Proceeding Series

ER -