Shasta

Interactive reporting at scale

Gokul Nath Babu Manoharan, Stephan Ellner, Karl Schnaitter, Sridatta Chegu, Alejandro Estrella-Balderrama, Stephan Gudmundson, Apurv Gupta, Ben Handy, Bart Samwel, Chad Whipkey, Larysa Aharkava, Himani Apte, Nitin Gangahar, Jun Xu, Shivakumar Venkataraman, Divyakant Agrawal, Jeffrey D. Ullman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

We describe Shasta, a middleware system built at Google to support interactive reporting in complex user-facing applications related to Google's Internet advertising business. Shasta targets applications with challenging requirements: First, user query latencies must be low. Second, underlying transactional data stores have complex "read-unfriendly" schemas, placing significant transformation logic between stored data and the read-only views that Shasta exposes to its clients. This transformation logic must be expressed in a way that scales to large and agile engineering teams. Finally, Shasta targets applications with strong data freshness requirements, making it challenging to precompute query results using common techniques such as ETL pipelines or materialized views. Instead, online queries must go all the way from primary storage to userfacing views, resulting in complex queries joining 50 or more tables. Designed as a layer on top of Google's F1 RDBMS and Mesa data warehouse, Shasta combines language and system techniques to meet these requirements. To help with expressing complex view specifications, we developed a query language called RVL, with support for modularized view templates that can be dynamically compiled into SQL. To execute these SQL queries with low latency at scale, we leveraged and extended F1's distributed query engine with facilities such as safe execution of C++and Java UDFs. To reduce latency and increase read parallelism, we extended F1 storage with a distributed read-only in-memory cache. The system we describe is in production at Google, powering critical applications used by advertisers and internal sales teams. Shasta has significantly improved system scalability and software engineering efficiency compared to the middleware solutions it replaced.

Original languageEnglish
Title of host publicationSIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages1393-1404
Number of pages12
Volume26-June-2016
ISBN (Electronic)9781450335317
DOIs
Publication statusPublished - 26 Jun 2016
Externally publishedYes
Event2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016 - San Francisco, United States
Duration: 26 Jun 20161 Jul 2016

Other

Other2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016
CountryUnited States
CitySan Francisco
Period26/6/161/7/16

Fingerprint

Middleware
Cache memory
Facings
Data warehouses
Query languages
Joining
Scalability
Marketing
Software engineering
Sales
Pipelines
Internet
Engines
Specifications
Industry

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Manoharan, G. N. B., Ellner, S., Schnaitter, K., Chegu, S., Estrella-Balderrama, A., Gudmundson, S., ... Ullman, J. D. (2016). Shasta: Interactive reporting at scale. In SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data (Vol. 26-June-2016, pp. 1393-1404). Association for Computing Machinery. https://doi.org/10.1145/2882903.2904444

Shasta : Interactive reporting at scale. / Manoharan, Gokul Nath Babu; Ellner, Stephan; Schnaitter, Karl; Chegu, Sridatta; Estrella-Balderrama, Alejandro; Gudmundson, Stephan; Gupta, Apurv; Handy, Ben; Samwel, Bart; Whipkey, Chad; Aharkava, Larysa; Apte, Himani; Gangahar, Nitin; Xu, Jun; Venkataraman, Shivakumar; Agrawal, Divyakant; Ullman, Jeffrey D.

SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data. Vol. 26-June-2016 Association for Computing Machinery, 2016. p. 1393-1404.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Manoharan, GNB, Ellner, S, Schnaitter, K, Chegu, S, Estrella-Balderrama, A, Gudmundson, S, Gupta, A, Handy, B, Samwel, B, Whipkey, C, Aharkava, L, Apte, H, Gangahar, N, Xu, J, Venkataraman, S, Agrawal, D & Ullman, JD 2016, Shasta: Interactive reporting at scale. in SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data. vol. 26-June-2016, Association for Computing Machinery, pp. 1393-1404, 2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016, San Francisco, United States, 26/6/16. https://doi.org/10.1145/2882903.2904444
Manoharan GNB, Ellner S, Schnaitter K, Chegu S, Estrella-Balderrama A, Gudmundson S et al. Shasta: Interactive reporting at scale. In SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data. Vol. 26-June-2016. Association for Computing Machinery. 2016. p. 1393-1404 https://doi.org/10.1145/2882903.2904444
Manoharan, Gokul Nath Babu ; Ellner, Stephan ; Schnaitter, Karl ; Chegu, Sridatta ; Estrella-Balderrama, Alejandro ; Gudmundson, Stephan ; Gupta, Apurv ; Handy, Ben ; Samwel, Bart ; Whipkey, Chad ; Aharkava, Larysa ; Apte, Himani ; Gangahar, Nitin ; Xu, Jun ; Venkataraman, Shivakumar ; Agrawal, Divyakant ; Ullman, Jeffrey D. / Shasta : Interactive reporting at scale. SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data. Vol. 26-June-2016 Association for Computing Machinery, 2016. pp. 1393-1404
@inproceedings{77be9fc2447847f697a88ce25c2367b5,
title = "Shasta: Interactive reporting at scale",
abstract = "We describe Shasta, a middleware system built at Google to support interactive reporting in complex user-facing applications related to Google's Internet advertising business. Shasta targets applications with challenging requirements: First, user query latencies must be low. Second, underlying transactional data stores have complex {"}read-unfriendly{"} schemas, placing significant transformation logic between stored data and the read-only views that Shasta exposes to its clients. This transformation logic must be expressed in a way that scales to large and agile engineering teams. Finally, Shasta targets applications with strong data freshness requirements, making it challenging to precompute query results using common techniques such as ETL pipelines or materialized views. Instead, online queries must go all the way from primary storage to userfacing views, resulting in complex queries joining 50 or more tables. Designed as a layer on top of Google's F1 RDBMS and Mesa data warehouse, Shasta combines language and system techniques to meet these requirements. To help with expressing complex view specifications, we developed a query language called RVL, with support for modularized view templates that can be dynamically compiled into SQL. To execute these SQL queries with low latency at scale, we leveraged and extended F1's distributed query engine with facilities such as safe execution of C++and Java UDFs. To reduce latency and increase read parallelism, we extended F1 storage with a distributed read-only in-memory cache. The system we describe is in production at Google, powering critical applications used by advertisers and internal sales teams. Shasta has significantly improved system scalability and software engineering efficiency compared to the middleware solutions it replaced.",
author = "Manoharan, {Gokul Nath Babu} and Stephan Ellner and Karl Schnaitter and Sridatta Chegu and Alejandro Estrella-Balderrama and Stephan Gudmundson and Apurv Gupta and Ben Handy and Bart Samwel and Chad Whipkey and Larysa Aharkava and Himani Apte and Nitin Gangahar and Jun Xu and Shivakumar Venkataraman and Divyakant Agrawal and Ullman, {Jeffrey D.}",
year = "2016",
month = "6",
day = "26",
doi = "10.1145/2882903.2904444",
language = "English",
volume = "26-June-2016",
pages = "1393--1404",
booktitle = "SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - Shasta

T2 - Interactive reporting at scale

AU - Manoharan, Gokul Nath Babu

AU - Ellner, Stephan

AU - Schnaitter, Karl

AU - Chegu, Sridatta

AU - Estrella-Balderrama, Alejandro

AU - Gudmundson, Stephan

AU - Gupta, Apurv

AU - Handy, Ben

AU - Samwel, Bart

AU - Whipkey, Chad

AU - Aharkava, Larysa

AU - Apte, Himani

AU - Gangahar, Nitin

AU - Xu, Jun

AU - Venkataraman, Shivakumar

AU - Agrawal, Divyakant

AU - Ullman, Jeffrey D.

PY - 2016/6/26

Y1 - 2016/6/26

N2 - We describe Shasta, a middleware system built at Google to support interactive reporting in complex user-facing applications related to Google's Internet advertising business. Shasta targets applications with challenging requirements: First, user query latencies must be low. Second, underlying transactional data stores have complex "read-unfriendly" schemas, placing significant transformation logic between stored data and the read-only views that Shasta exposes to its clients. This transformation logic must be expressed in a way that scales to large and agile engineering teams. Finally, Shasta targets applications with strong data freshness requirements, making it challenging to precompute query results using common techniques such as ETL pipelines or materialized views. Instead, online queries must go all the way from primary storage to userfacing views, resulting in complex queries joining 50 or more tables. Designed as a layer on top of Google's F1 RDBMS and Mesa data warehouse, Shasta combines language and system techniques to meet these requirements. To help with expressing complex view specifications, we developed a query language called RVL, with support for modularized view templates that can be dynamically compiled into SQL. To execute these SQL queries with low latency at scale, we leveraged and extended F1's distributed query engine with facilities such as safe execution of C++and Java UDFs. To reduce latency and increase read parallelism, we extended F1 storage with a distributed read-only in-memory cache. The system we describe is in production at Google, powering critical applications used by advertisers and internal sales teams. Shasta has significantly improved system scalability and software engineering efficiency compared to the middleware solutions it replaced.

AB - We describe Shasta, a middleware system built at Google to support interactive reporting in complex user-facing applications related to Google's Internet advertising business. Shasta targets applications with challenging requirements: First, user query latencies must be low. Second, underlying transactional data stores have complex "read-unfriendly" schemas, placing significant transformation logic between stored data and the read-only views that Shasta exposes to its clients. This transformation logic must be expressed in a way that scales to large and agile engineering teams. Finally, Shasta targets applications with strong data freshness requirements, making it challenging to precompute query results using common techniques such as ETL pipelines or materialized views. Instead, online queries must go all the way from primary storage to userfacing views, resulting in complex queries joining 50 or more tables. Designed as a layer on top of Google's F1 RDBMS and Mesa data warehouse, Shasta combines language and system techniques to meet these requirements. To help with expressing complex view specifications, we developed a query language called RVL, with support for modularized view templates that can be dynamically compiled into SQL. To execute these SQL queries with low latency at scale, we leveraged and extended F1's distributed query engine with facilities such as safe execution of C++and Java UDFs. To reduce latency and increase read parallelism, we extended F1 storage with a distributed read-only in-memory cache. The system we describe is in production at Google, powering critical applications used by advertisers and internal sales teams. Shasta has significantly improved system scalability and software engineering efficiency compared to the middleware solutions it replaced.

UR - http://www.scopus.com/inward/record.url?scp=84979673824&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84979673824&partnerID=8YFLogxK

U2 - 10.1145/2882903.2904444

DO - 10.1145/2882903.2904444

M3 - Conference contribution

VL - 26-June-2016

SP - 1393

EP - 1404

BT - SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data

PB - Association for Computing Machinery

ER -