Clay: Fine-grained adaptive partitioning for general database schemas

Marco Serafini, Rebecca Taft, Aaron J. Elmore, Andrew Pavlo, Ashraf Aboulnaga, Michael Stonebraker

Research output: Contribution to journalConference article

18 Citations (Scopus)

Abstract

Transaction processing database management systems (DBMSs) are critical for today's data-intensive applications because they enable an organization to quickly ingest and query new information. Many of these applications exceed the capabilities of a single server, and thus their database has to be deployed in a distributed DBMS. The key factor affecting such a system's performance is how the database is partitioned. If the database is partitioned incorrectly, the number of distributed transactions can be high. These transactions have to synchronize their operations over the network, which is considerably slower and leads to poor performance. Previous work on elastic database repartitioning has focused on a certain class of applications whose database schema can be represented in a hierarchical tree structure. But many applications cannot be partitioned in this manner, and thus are subject to distributed transactions that impede their performance and scalability. In this paper, we present a new on-line partitioning approach, called Clay, that supports both tree-based schemas and more complex "general" schemas with arbitrary foreign key relationships. Clay dynamically creates blocks of tuples to migrate among servers during repartitioning, placing no constraints on the schema but taking care to balance load and reduce the amount of data migrated. Clay achieves this goal by including in each block a set of hot tuples and other tuples co-accessed with these hot tuples. To evaluate our approach, we integrate Clay in a distributed, main-memory DBMS and show that it can generate partitioning schemes that enable the system to achieve up to 15 × better throughput and 99% lower latency than existing approaches.

Original languageEnglish
Pages (from-to)445-456
Number of pages12
JournalProceedings of the VLDB Endowment
Volume10
Issue number4
DOIs
Publication statusPublished - 1 Jan 2016

Fingerprint

Clay
Servers
Scalability
Throughput
Data storage equipment
Processing

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Clay : Fine-grained adaptive partitioning for general database schemas. / Serafini, Marco; Taft, Rebecca; Elmore, Aaron J.; Pavlo, Andrew; Aboulnaga, Ashraf; Stonebraker, Michael.

In: Proceedings of the VLDB Endowment, Vol. 10, No. 4, 01.01.2016, p. 445-456.

Research output: Contribution to journalConference article

Serafini, Marco ; Taft, Rebecca ; Elmore, Aaron J. ; Pavlo, Andrew ; Aboulnaga, Ashraf ; Stonebraker, Michael. / Clay : Fine-grained adaptive partitioning for general database schemas. In: Proceedings of the VLDB Endowment. 2016 ; Vol. 10, No. 4. pp. 445-456.
@article{63edd692b1cb4f62babb9f8c8c15526d,
title = "Clay: Fine-grained adaptive partitioning for general database schemas",
abstract = "Transaction processing database management systems (DBMSs) are critical for today's data-intensive applications because they enable an organization to quickly ingest and query new information. Many of these applications exceed the capabilities of a single server, and thus their database has to be deployed in a distributed DBMS. The key factor affecting such a system's performance is how the database is partitioned. If the database is partitioned incorrectly, the number of distributed transactions can be high. These transactions have to synchronize their operations over the network, which is considerably slower and leads to poor performance. Previous work on elastic database repartitioning has focused on a certain class of applications whose database schema can be represented in a hierarchical tree structure. But many applications cannot be partitioned in this manner, and thus are subject to distributed transactions that impede their performance and scalability. In this paper, we present a new on-line partitioning approach, called Clay, that supports both tree-based schemas and more complex {"}general{"} schemas with arbitrary foreign key relationships. Clay dynamically creates blocks of tuples to migrate among servers during repartitioning, placing no constraints on the schema but taking care to balance load and reduce the amount of data migrated. Clay achieves this goal by including in each block a set of hot tuples and other tuples co-accessed with these hot tuples. To evaluate our approach, we integrate Clay in a distributed, main-memory DBMS and show that it can generate partitioning schemes that enable the system to achieve up to 15 × better throughput and 99{\%} lower latency than existing approaches.",
author = "Marco Serafini and Rebecca Taft and Elmore, {Aaron J.} and Andrew Pavlo and Ashraf Aboulnaga and Michael Stonebraker",
year = "2016",
month = "1",
day = "1",
doi = "10.14778/3025111.3025125",
language = "English",
volume = "10",
pages = "445--456",
journal = "Proceedings of the VLDB Endowment",
issn = "2150-8097",
publisher = "Very Large Data Base Endowment Inc.",
number = "4",

}

TY - JOUR

T1 - Clay

T2 - Fine-grained adaptive partitioning for general database schemas

AU - Serafini, Marco

AU - Taft, Rebecca

AU - Elmore, Aaron J.

AU - Pavlo, Andrew

AU - Aboulnaga, Ashraf

AU - Stonebraker, Michael

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Transaction processing database management systems (DBMSs) are critical for today's data-intensive applications because they enable an organization to quickly ingest and query new information. Many of these applications exceed the capabilities of a single server, and thus their database has to be deployed in a distributed DBMS. The key factor affecting such a system's performance is how the database is partitioned. If the database is partitioned incorrectly, the number of distributed transactions can be high. These transactions have to synchronize their operations over the network, which is considerably slower and leads to poor performance. Previous work on elastic database repartitioning has focused on a certain class of applications whose database schema can be represented in a hierarchical tree structure. But many applications cannot be partitioned in this manner, and thus are subject to distributed transactions that impede their performance and scalability. In this paper, we present a new on-line partitioning approach, called Clay, that supports both tree-based schemas and more complex "general" schemas with arbitrary foreign key relationships. Clay dynamically creates blocks of tuples to migrate among servers during repartitioning, placing no constraints on the schema but taking care to balance load and reduce the amount of data migrated. Clay achieves this goal by including in each block a set of hot tuples and other tuples co-accessed with these hot tuples. To evaluate our approach, we integrate Clay in a distributed, main-memory DBMS and show that it can generate partitioning schemes that enable the system to achieve up to 15 × better throughput and 99% lower latency than existing approaches.

AB - Transaction processing database management systems (DBMSs) are critical for today's data-intensive applications because they enable an organization to quickly ingest and query new information. Many of these applications exceed the capabilities of a single server, and thus their database has to be deployed in a distributed DBMS. The key factor affecting such a system's performance is how the database is partitioned. If the database is partitioned incorrectly, the number of distributed transactions can be high. These transactions have to synchronize their operations over the network, which is considerably slower and leads to poor performance. Previous work on elastic database repartitioning has focused on a certain class of applications whose database schema can be represented in a hierarchical tree structure. But many applications cannot be partitioned in this manner, and thus are subject to distributed transactions that impede their performance and scalability. In this paper, we present a new on-line partitioning approach, called Clay, that supports both tree-based schemas and more complex "general" schemas with arbitrary foreign key relationships. Clay dynamically creates blocks of tuples to migrate among servers during repartitioning, placing no constraints on the schema but taking care to balance load and reduce the amount of data migrated. Clay achieves this goal by including in each block a set of hot tuples and other tuples co-accessed with these hot tuples. To evaluate our approach, we integrate Clay in a distributed, main-memory DBMS and show that it can generate partitioning schemes that enable the system to achieve up to 15 × better throughput and 99% lower latency than existing approaches.

UR - http://www.scopus.com/inward/record.url?scp=85020426057&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85020426057&partnerID=8YFLogxK

U2 - 10.14778/3025111.3025125

DO - 10.14778/3025111.3025125

M3 - Conference article

AN - SCOPUS:85020426057

VL - 10

SP - 445

EP - 456

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

SN - 2150-8097

IS - 4

ER -