LRC: Dependency-aware cache management for data analytics clusters

Yinghao Yu, Wei Wang, Jun Zhang, Khaled Letaief

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

Memory caches are being aggressively used in today's data-parallel systems such as Spark, Tez, and Piccolo. However, prevalent systems employ rather simple cache management policies - notably the Least Recently Used (LRU) policy - that are oblivious to the application semantics of data dependency, expressed as a directed acyclic graph (DAG). Without this knowledge, memory caching can at best be performed by 'guessing' the future data access patterns based on historical information (e.g., the access recency and/or frequency), which frequently results in inefficient, erroneous caching with low hit ratio and a long response time. In this paper, we propose a novel cache replacement policy, Least Reference Count (LRC), which exploits the application-specific DAG information to optimize the cache management. LRC evicts the cached data blocks whose reference count is the smallest. The reference count is defined, for each data block, as the number of dependent child blocks that have not been computed yet. We demonstrate the efficacy of LRC through both empirical analysis and cluster deployments against popular benchmarking workloads. Our Spark implementation shows that, compared with LRU, LRC speeds up typical applications by 60%.

Original languageEnglish
Title of host publicationINFOCOM 2017 - IEEE Conference on Computer Communications
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509053360
DOIs
Publication statusPublished - 2 Oct 2017
Externally publishedYes
Event2017 IEEE Conference on Computer Communications, INFOCOM 2017 - Atlanta, United States
Duration: 1 May 20174 May 2017

Other

Other2017 IEEE Conference on Computer Communications, INFOCOM 2017
CountryUnited States
CityAtlanta
Period1/5/174/5/17

    Fingerprint

ASJC Scopus subject areas

  • Computer Science(all)
  • Electrical and Electronic Engineering

Cite this

Yu, Y., Wang, W., Zhang, J., & Letaief, K. (2017). LRC: Dependency-aware cache management for data analytics clusters. In INFOCOM 2017 - IEEE Conference on Computer Communications [8057007] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/INFOCOM.2017.8057007