Zab: High-performance broadcast for primary-backup systems

Flavio P. Junqueira, Benjamin C. Reed, Marco Serafini

Research output: Chapter in Book/Report/Conference proceedingConference contribution

124 Citations (Scopus)

Abstract

Zab is a crash-recovery atomic broadcast algorithm we designed for the ZooKeeper coordination service. ZooKeeper implements a primary-backup scheme in which a primary process executes clients operations and uses Zab to propagate the corresponding incremental state changes to backup processes1. Due the dependence of an incremental state change on the sequence of changes previously generated, Zab must guarantee that if it delivers a given state change, then all other changes it depends upon must be delivered first. Since primaries may crash, Zab must satisfy this requirement despite crashes of primaries. Applications using ZooKeeper demand high-performance from the service, and consequently, one important goal is the ability of having multiple outstanding client operations at a time. Zab enables multiple outstanding state changes by guaranteeing that at most one primary is able to broadcast state changes and have them incorporated into the state, and by using a synchronization phase while establishing a new primary. Before this synchronization phase completes, a new primary does not broadcast new state changes. Finally, Zab uses an identification scheme for state changes that enables a process to easily identify missing changes. This feature is key for efficient recovery. Experiments and experience so far in production show that our design enables an implementation that meets the performance requirements of our applications. Our implementation of Zab can achieve tens of thousands of broadcasts per second, which is sufficient for demanding systems such as our Web-scale applications.

Original languageEnglish
Title of host publication2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks, DSN 2011
Pages245-256
Number of pages12
DOIs
Publication statusPublished - 26 Aug 2011
Event2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks, DSN 2011 - Hong Kong, Hong Kong
Duration: 27 Jun 201130 Jun 2011

Publication series

NameProceedings of the International Conference on Dependable Systems and Networks

Other

Other2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks, DSN 2011
CountryHong Kong
CityHong Kong
Period27/6/1130/6/11

    Fingerprint

Keywords

  • Asynchronous consensus
  • Atomic broadcast
  • Distributed algorithms
  • Fault tolerance
  • Primary backup

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Junqueira, F. P., Reed, B. C., & Serafini, M. (2011). Zab: High-performance broadcast for primary-backup systems. In 2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks, DSN 2011 (pp. 245-256). [5958223] (Proceedings of the International Conference on Dependable Systems and Networks). https://doi.org/10.1109/DSN.2011.5958223