SCRAP

A statistical approach for creating a database query workload based on performance bottlenecks

James Skarie, Biplob K. Debnath, David J. Lilja, Mohamed Mokbel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the tremendous growth in stored data, the role of database systems has become more significant than ever before. Standard query workloads, such as the TPC-C and TPC-H benchmark suites, are used to evaluate and tune the functionality and performance of database systems. Running and configuring benchmarks is a time consuming task. It requires substantial statistical expertise due to the enormous data size and large number of queries in the workload. Subsetting can be used to reduce the number of queries in a workload. An existing workload subsetting technique selected queries based on similarities of the ranks of the queries for low-level characteristics, such as cache miss rates, or based on the execution time required in different computer systems. However, many low-level characteristics are correlated, produce similar behaviors. Also, raw execution time as a metric is too diffuse to capture important performance bottlenecks. Our goal is to select a subset of queries that can reproduce the same bottlenecks in the system as the original workload. In this paper, we propose a statistical approach for creating a database query workload based on performance bottlenecks (SCRAP). Our methodology takes a query workload and a set of system configuration parameters as inputs, and selects a subset of the queries from the workload based on the similarity of performance bottlenecks. Experimental results using the TPC-H benchmark and the PostgreSQL database system, show that the reduced workload and the original workload produce similar performance bottlenecks, and the subset accurately estimates the total execution time.

Original languageEnglish
Title of host publicationProceedings of the 2007 IEEE International Symposium on Workload Characterization, IISWC
Pages183-192
Number of pages10
DOIs
Publication statusPublished - 1 Dec 2007
Externally publishedYes
Event2007 IEEE International Symposium on Workload Characterization, IISWC - Boston, MA, United States
Duration: 27 Sep 200729 Sep 2007

Other

Other2007 IEEE International Symposium on Workload Characterization, IISWC
CountryUnited States
CityBoston, MA
Period27/9/0729/9/07

Fingerprint

Computer systems

ASJC Scopus subject areas

  • Computer Science Applications
  • Hardware and Architecture
  • Electrical and Electronic Engineering

Cite this

Skarie, J., Debnath, B. K., Lilja, D. J., & Mokbel, M. (2007). SCRAP: A statistical approach for creating a database query workload based on performance bottlenecks. In Proceedings of the 2007 IEEE International Symposium on Workload Characterization, IISWC (pp. 183-192). [4362194] https://doi.org/10.1109/IISWC.2007.4362194

SCRAP : A statistical approach for creating a database query workload based on performance bottlenecks. / Skarie, James; Debnath, Biplob K.; Lilja, David J.; Mokbel, Mohamed.

Proceedings of the 2007 IEEE International Symposium on Workload Characterization, IISWC. 2007. p. 183-192 4362194.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Skarie, J, Debnath, BK, Lilja, DJ & Mokbel, M 2007, SCRAP: A statistical approach for creating a database query workload based on performance bottlenecks. in Proceedings of the 2007 IEEE International Symposium on Workload Characterization, IISWC., 4362194, pp. 183-192, 2007 IEEE International Symposium on Workload Characterization, IISWC, Boston, MA, United States, 27/9/07. https://doi.org/10.1109/IISWC.2007.4362194
Skarie J, Debnath BK, Lilja DJ, Mokbel M. SCRAP: A statistical approach for creating a database query workload based on performance bottlenecks. In Proceedings of the 2007 IEEE International Symposium on Workload Characterization, IISWC. 2007. p. 183-192. 4362194 https://doi.org/10.1109/IISWC.2007.4362194
Skarie, James ; Debnath, Biplob K. ; Lilja, David J. ; Mokbel, Mohamed. / SCRAP : A statistical approach for creating a database query workload based on performance bottlenecks. Proceedings of the 2007 IEEE International Symposium on Workload Characterization, IISWC. 2007. pp. 183-192
@inproceedings{9eee7952ebc74e4abb1b251b818d1843,
title = "SCRAP: A statistical approach for creating a database query workload based on performance bottlenecks",
abstract = "With the tremendous growth in stored data, the role of database systems has become more significant than ever before. Standard query workloads, such as the TPC-C and TPC-H benchmark suites, are used to evaluate and tune the functionality and performance of database systems. Running and configuring benchmarks is a time consuming task. It requires substantial statistical expertise due to the enormous data size and large number of queries in the workload. Subsetting can be used to reduce the number of queries in a workload. An existing workload subsetting technique selected queries based on similarities of the ranks of the queries for low-level characteristics, such as cache miss rates, or based on the execution time required in different computer systems. However, many low-level characteristics are correlated, produce similar behaviors. Also, raw execution time as a metric is too diffuse to capture important performance bottlenecks. Our goal is to select a subset of queries that can reproduce the same bottlenecks in the system as the original workload. In this paper, we propose a statistical approach for creating a database query workload based on performance bottlenecks (SCRAP). Our methodology takes a query workload and a set of system configuration parameters as inputs, and selects a subset of the queries from the workload based on the similarity of performance bottlenecks. Experimental results using the TPC-H benchmark and the PostgreSQL database system, show that the reduced workload and the original workload produce similar performance bottlenecks, and the subset accurately estimates the total execution time.",
author = "James Skarie and Debnath, {Biplob K.} and Lilja, {David J.} and Mohamed Mokbel",
year = "2007",
month = "12",
day = "1",
doi = "10.1109/IISWC.2007.4362194",
language = "English",
isbn = "1424415616",
pages = "183--192",
booktitle = "Proceedings of the 2007 IEEE International Symposium on Workload Characterization, IISWC",

}

TY - GEN

T1 - SCRAP

T2 - A statistical approach for creating a database query workload based on performance bottlenecks

AU - Skarie, James

AU - Debnath, Biplob K.

AU - Lilja, David J.

AU - Mokbel, Mohamed

PY - 2007/12/1

Y1 - 2007/12/1

N2 - With the tremendous growth in stored data, the role of database systems has become more significant than ever before. Standard query workloads, such as the TPC-C and TPC-H benchmark suites, are used to evaluate and tune the functionality and performance of database systems. Running and configuring benchmarks is a time consuming task. It requires substantial statistical expertise due to the enormous data size and large number of queries in the workload. Subsetting can be used to reduce the number of queries in a workload. An existing workload subsetting technique selected queries based on similarities of the ranks of the queries for low-level characteristics, such as cache miss rates, or based on the execution time required in different computer systems. However, many low-level characteristics are correlated, produce similar behaviors. Also, raw execution time as a metric is too diffuse to capture important performance bottlenecks. Our goal is to select a subset of queries that can reproduce the same bottlenecks in the system as the original workload. In this paper, we propose a statistical approach for creating a database query workload based on performance bottlenecks (SCRAP). Our methodology takes a query workload and a set of system configuration parameters as inputs, and selects a subset of the queries from the workload based on the similarity of performance bottlenecks. Experimental results using the TPC-H benchmark and the PostgreSQL database system, show that the reduced workload and the original workload produce similar performance bottlenecks, and the subset accurately estimates the total execution time.

AB - With the tremendous growth in stored data, the role of database systems has become more significant than ever before. Standard query workloads, such as the TPC-C and TPC-H benchmark suites, are used to evaluate and tune the functionality and performance of database systems. Running and configuring benchmarks is a time consuming task. It requires substantial statistical expertise due to the enormous data size and large number of queries in the workload. Subsetting can be used to reduce the number of queries in a workload. An existing workload subsetting technique selected queries based on similarities of the ranks of the queries for low-level characteristics, such as cache miss rates, or based on the execution time required in different computer systems. However, many low-level characteristics are correlated, produce similar behaviors. Also, raw execution time as a metric is too diffuse to capture important performance bottlenecks. Our goal is to select a subset of queries that can reproduce the same bottlenecks in the system as the original workload. In this paper, we propose a statistical approach for creating a database query workload based on performance bottlenecks (SCRAP). Our methodology takes a query workload and a set of system configuration parameters as inputs, and selects a subset of the queries from the workload based on the similarity of performance bottlenecks. Experimental results using the TPC-H benchmark and the PostgreSQL database system, show that the reduced workload and the original workload produce similar performance bottlenecks, and the subset accurately estimates the total execution time.

UR - http://www.scopus.com/inward/record.url?scp=47349087255&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=47349087255&partnerID=8YFLogxK

U2 - 10.1109/IISWC.2007.4362194

DO - 10.1109/IISWC.2007.4362194

M3 - Conference contribution

SN - 1424415616

SN - 9781424415618

SP - 183

EP - 192

BT - Proceedings of the 2007 IEEE International Symposium on Workload Characterization, IISWC

ER -