Recovering from Multiple Process Failures in the Time Warp Mechanism

Divyakant Agrawal, Jonathan R. Agre

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

In this paper we describe a recovery protocol for distributed systems using the Time Warp control mechanism. The proposed protocol is fault tolerant to multiple process failures. Time Warp is an optimistic execution technique in which synchronization is achieved using rollback. Our recovery protocol exploits the redundancy already available to implement process rollback in the Time Warp mechanism. Thus, the recovery protocol has little additional bookkeeping overhead, which contrasts with many other recovery protocols.

Original languageEnglish
Pages (from-to)1504-1514
Number of pages11
JournalIEEE Transactions on Computers
Volume41
Issue number12
DOIs
Publication statusPublished - 1992
Externally publishedYes

Fingerprint

Time Warp
Recovery
Redundancy
Synchronization
Fault-tolerant
Distributed Systems

Keywords

  • Distributed processing
  • distributed simulation
  • fault tolerance
  • optimistic synchronization
  • parallel processing
  • rollback recovery
  • virtual time

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Hardware and Architecture
  • Software
  • Theoretical Computer Science

Cite this

Recovering from Multiple Process Failures in the Time Warp Mechanism. / Agrawal, Divyakant; Agre, Jonathan R.

In: IEEE Transactions on Computers, Vol. 41, No. 12, 1992, p. 1504-1514.

Research output: Contribution to journalArticle

Agrawal, Divyakant ; Agre, Jonathan R. / Recovering from Multiple Process Failures in the Time Warp Mechanism. In: IEEE Transactions on Computers. 1992 ; Vol. 41, No. 12. pp. 1504-1514.
@article{cb19068bf133402bb8f74e18111f2944,
title = "Recovering from Multiple Process Failures in the Time Warp Mechanism",
abstract = "In this paper we describe a recovery protocol for distributed systems using the Time Warp control mechanism. The proposed protocol is fault tolerant to multiple process failures. Time Warp is an optimistic execution technique in which synchronization is achieved using rollback. Our recovery protocol exploits the redundancy already available to implement process rollback in the Time Warp mechanism. Thus, the recovery protocol has little additional bookkeeping overhead, which contrasts with many other recovery protocols.",
keywords = "Distributed processing, distributed simulation, fault tolerance, optimistic synchronization, parallel processing, rollback recovery, virtual time",
author = "Divyakant Agrawal and Agre, {Jonathan R.}",
year = "1992",
doi = "10.1109/12.214658",
language = "English",
volume = "41",
pages = "1504--1514",
journal = "IEEE Transactions on Computers",
issn = "0018-9340",
publisher = "IEEE Computer Society",
number = "12",

}

TY - JOUR

T1 - Recovering from Multiple Process Failures in the Time Warp Mechanism

AU - Agrawal, Divyakant

AU - Agre, Jonathan R.

PY - 1992

Y1 - 1992

N2 - In this paper we describe a recovery protocol for distributed systems using the Time Warp control mechanism. The proposed protocol is fault tolerant to multiple process failures. Time Warp is an optimistic execution technique in which synchronization is achieved using rollback. Our recovery protocol exploits the redundancy already available to implement process rollback in the Time Warp mechanism. Thus, the recovery protocol has little additional bookkeeping overhead, which contrasts with many other recovery protocols.

AB - In this paper we describe a recovery protocol for distributed systems using the Time Warp control mechanism. The proposed protocol is fault tolerant to multiple process failures. Time Warp is an optimistic execution technique in which synchronization is achieved using rollback. Our recovery protocol exploits the redundancy already available to implement process rollback in the Time Warp mechanism. Thus, the recovery protocol has little additional bookkeeping overhead, which contrasts with many other recovery protocols.

KW - Distributed processing

KW - distributed simulation

KW - fault tolerance

KW - optimistic synchronization

KW - parallel processing

KW - rollback recovery

KW - virtual time

UR - http://www.scopus.com/inward/record.url?scp=5944240473&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=5944240473&partnerID=8YFLogxK

U2 - 10.1109/12.214658

DO - 10.1109/12.214658

M3 - Article

VL - 41

SP - 1504

EP - 1514

JO - IEEE Transactions on Computers

JF - IEEE Transactions on Computers

SN - 0018-9340

IS - 12

ER -