Choosing a cloud DBMS: Architectures and tradeoffs

Junjay Tan, Thanaa Ghanem, Matthew Perron, Xiangyao Yu, Michael Stonebraker, David DeWitt, Marco Serafini, Ashraf Aboulnaga, Tim Kraska

Research output: Contribution to journalConference article

Abstract

As analytic (OLAP) applications move to the cloud, DBMSs have shifted from employing a pure shared-nothing design with locally attached storage to a hybrid design that combines the use of shared-storage (e.g., AWS S3) with the use of shared-nothing query execution mechanisms. This paper sheds light on the resulting tradeoffs, which have not been properly identified in previous work. To this end, it evaluates the TPC-H benchmark across a variety of DBMS offerings running in a cloud environment (AWS) on fast 10Gb+ networks, specifically database-as-a-service offerings (Redshift, Athena), query engines (Presto, Hive), and a traditional cloud agnostic OLAP database (Vertica). While these comparisons cannot be apples-to-apples in all cases due to cloud configuration restrictions, we nonetheless identify patterns and design choices that are advantageous. These include prioritizing low-cost object stores like S3 for data storage, using system agnostic yet still performant columnar formats like ORC that allow easy switching to other systems for different workloads, and making features that benefit subsequent runs like query precompilation and caching remote data to faster storage optional rather than required because they disadvantage ad hoc queries.

Original languageEnglish
Pages (from-to)2170-2182
Number of pages13
JournalProceedings of the VLDB Endowment
Volume12
Issue number12
DOIs
Publication statusPublished - 1 Jan 2018
Event45th International Conference on Very Large Data Bases, VLDB 2019 - Los Angeles, United States
Duration: 26 Aug 201730 Aug 2017

Fingerprint

Engines
Data storage equipment
Costs

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Tan, J., Ghanem, T., Perron, M., Yu, X., Stonebraker, M., DeWitt, D., ... Kraska, T. (2018). Choosing a cloud DBMS: Architectures and tradeoffs. Proceedings of the VLDB Endowment, 12(12), 2170-2182. https://doi.org/10.14778/3352063.3352133

Choosing a cloud DBMS : Architectures and tradeoffs. / Tan, Junjay; Ghanem, Thanaa; Perron, Matthew; Yu, Xiangyao; Stonebraker, Michael; DeWitt, David; Serafini, Marco; Aboulnaga, Ashraf; Kraska, Tim.

In: Proceedings of the VLDB Endowment, Vol. 12, No. 12, 01.01.2018, p. 2170-2182.

Research output: Contribution to journalConference article

Tan, J, Ghanem, T, Perron, M, Yu, X, Stonebraker, M, DeWitt, D, Serafini, M, Aboulnaga, A & Kraska, T 2018, 'Choosing a cloud DBMS: Architectures and tradeoffs', Proceedings of the VLDB Endowment, vol. 12, no. 12, pp. 2170-2182. https://doi.org/10.14778/3352063.3352133
Tan J, Ghanem T, Perron M, Yu X, Stonebraker M, DeWitt D et al. Choosing a cloud DBMS: Architectures and tradeoffs. Proceedings of the VLDB Endowment. 2018 Jan 1;12(12):2170-2182. https://doi.org/10.14778/3352063.3352133
Tan, Junjay ; Ghanem, Thanaa ; Perron, Matthew ; Yu, Xiangyao ; Stonebraker, Michael ; DeWitt, David ; Serafini, Marco ; Aboulnaga, Ashraf ; Kraska, Tim. / Choosing a cloud DBMS : Architectures and tradeoffs. In: Proceedings of the VLDB Endowment. 2018 ; Vol. 12, No. 12. pp. 2170-2182.
@article{1a6e05d528ed4ddbae68538950ef9bb5,
title = "Choosing a cloud DBMS: Architectures and tradeoffs",
abstract = "As analytic (OLAP) applications move to the cloud, DBMSs have shifted from employing a pure shared-nothing design with locally attached storage to a hybrid design that combines the use of shared-storage (e.g., AWS S3) with the use of shared-nothing query execution mechanisms. This paper sheds light on the resulting tradeoffs, which have not been properly identified in previous work. To this end, it evaluates the TPC-H benchmark across a variety of DBMS offerings running in a cloud environment (AWS) on fast 10Gb+ networks, specifically database-as-a-service offerings (Redshift, Athena), query engines (Presto, Hive), and a traditional cloud agnostic OLAP database (Vertica). While these comparisons cannot be apples-to-apples in all cases due to cloud configuration restrictions, we nonetheless identify patterns and design choices that are advantageous. These include prioritizing low-cost object stores like S3 for data storage, using system agnostic yet still performant columnar formats like ORC that allow easy switching to other systems for different workloads, and making features that benefit subsequent runs like query precompilation and caching remote data to faster storage optional rather than required because they disadvantage ad hoc queries.",
author = "Junjay Tan and Thanaa Ghanem and Matthew Perron and Xiangyao Yu and Michael Stonebraker and David DeWitt and Marco Serafini and Ashraf Aboulnaga and Tim Kraska",
year = "2018",
month = "1",
day = "1",
doi = "10.14778/3352063.3352133",
language = "English",
volume = "12",
pages = "2170--2182",
journal = "Proceedings of the VLDB Endowment",
issn = "2150-8097",
publisher = "Very Large Data Base Endowment Inc.",
number = "12",

}

TY - JOUR

T1 - Choosing a cloud DBMS

T2 - Architectures and tradeoffs

AU - Tan, Junjay

AU - Ghanem, Thanaa

AU - Perron, Matthew

AU - Yu, Xiangyao

AU - Stonebraker, Michael

AU - DeWitt, David

AU - Serafini, Marco

AU - Aboulnaga, Ashraf

AU - Kraska, Tim

PY - 2018/1/1

Y1 - 2018/1/1

N2 - As analytic (OLAP) applications move to the cloud, DBMSs have shifted from employing a pure shared-nothing design with locally attached storage to a hybrid design that combines the use of shared-storage (e.g., AWS S3) with the use of shared-nothing query execution mechanisms. This paper sheds light on the resulting tradeoffs, which have not been properly identified in previous work. To this end, it evaluates the TPC-H benchmark across a variety of DBMS offerings running in a cloud environment (AWS) on fast 10Gb+ networks, specifically database-as-a-service offerings (Redshift, Athena), query engines (Presto, Hive), and a traditional cloud agnostic OLAP database (Vertica). While these comparisons cannot be apples-to-apples in all cases due to cloud configuration restrictions, we nonetheless identify patterns and design choices that are advantageous. These include prioritizing low-cost object stores like S3 for data storage, using system agnostic yet still performant columnar formats like ORC that allow easy switching to other systems for different workloads, and making features that benefit subsequent runs like query precompilation and caching remote data to faster storage optional rather than required because they disadvantage ad hoc queries.

AB - As analytic (OLAP) applications move to the cloud, DBMSs have shifted from employing a pure shared-nothing design with locally attached storage to a hybrid design that combines the use of shared-storage (e.g., AWS S3) with the use of shared-nothing query execution mechanisms. This paper sheds light on the resulting tradeoffs, which have not been properly identified in previous work. To this end, it evaluates the TPC-H benchmark across a variety of DBMS offerings running in a cloud environment (AWS) on fast 10Gb+ networks, specifically database-as-a-service offerings (Redshift, Athena), query engines (Presto, Hive), and a traditional cloud agnostic OLAP database (Vertica). While these comparisons cannot be apples-to-apples in all cases due to cloud configuration restrictions, we nonetheless identify patterns and design choices that are advantageous. These include prioritizing low-cost object stores like S3 for data storage, using system agnostic yet still performant columnar formats like ORC that allow easy switching to other systems for different workloads, and making features that benefit subsequent runs like query precompilation and caching remote data to faster storage optional rather than required because they disadvantage ad hoc queries.

UR - http://www.scopus.com/inward/record.url?scp=85074524915&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074524915&partnerID=8YFLogxK

U2 - 10.14778/3352063.3352133

DO - 10.14778/3352063.3352133

M3 - Conference article

AN - SCOPUS:85074524915

VL - 12

SP - 2170

EP - 2182

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

SN - 2150-8097

IS - 12

ER -