Constructing collaborative desktop storage caches for large scientific datasets

Sudharshan S. Vazhkudai, Xiaosong Ma, Vincent W. Freeh, Jonathan W. Strickland, Nandan Tammineedi, Tyler Simon, Stephen L. Scott

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

High-end computing is suffering a data deluge from experiments, simulations, and apparatus that creates overwhelming application dataset sizes. This has led to the proliferation of high-end mass storage systems, storage area clusters, and data centers. These storage facilities offer a large range of choices in terms of capacity and access rate, as well as strong data availability and consistency support. However, for most end-users, the "last mile" in their analysis pipeline often requires data processing and visualization at local computers, typically local desktop workstations. End-user workstations - despite having more processing power than ever before - are ill-equipped to cope with such data demands due to insufficient secondary storage space and I/O rates. Meanwhile, a large portion of desktop storage is unused. We propose the FreeLoader framework, which aggregates unused desktop storage space and I/O bandwidth into a shared cache/scratch space, for hosting large, immutable datasets and exploiting data access locality. This article presents the FreeLoader architecture, component design, and performance results based on our proof-of-concept prototype. Its architecture comprises contributing benefactor nodes, steered by a management layer, providing services such as data integrity, high performance, load balancing, and impact control. Our experiments show that FreeLoader is an appealing low-cost solution to storing massive datasets by delivering higher data access rates than traditional storage facilities, namely, local or remote shared file systems, storage systems, and Internet data repositories. In particular, we present novel data striping techniques that allow FreeLoader to efficiently aggregate a workstation's network communication bandwidth and local I/O bandwidth. In addition, the performance impact on the native workload of donor machines is small and can be effectively controlled. Further, we show that security features such as data encryptions and integrity checks can be easily added as filters for interested clients. Finally, we demonstrate how legacy applications can use the FreeLoader API to store and retrieve datasets.

Original languageEnglish
Pages (from-to)221-254
Number of pages34
JournalACM Transactions on Storage
Volume2
Issue number3
DOIs
Publication statusPublished - 1 Aug 2006
Externally publishedYes

Fingerprint

Computer workstations
Bandwidth
Data visualization
Application programming interfaces (API)
Resource allocation
Telecommunication networks
Cryptography
Pipelines
Experiments
Availability
Internet
Processing
Costs

Keywords

  • Distributed storage
  • Parallel I/O
  • Scientific data management
  • Server-less storage system
  • Storage cache
  • Storage networking
  • Storage resoucce management
  • Storage scavenging
  • Striped storage

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

Vazhkudai, S. S., Ma, X., Freeh, V. W., Strickland, J. W., Tammineedi, N., Simon, T., & Scott, S. L. (2006). Constructing collaborative desktop storage caches for large scientific datasets. ACM Transactions on Storage, 2(3), 221-254. https://doi.org/10.1145/1168910.1168911

Constructing collaborative desktop storage caches for large scientific datasets. / Vazhkudai, Sudharshan S.; Ma, Xiaosong; Freeh, Vincent W.; Strickland, Jonathan W.; Tammineedi, Nandan; Simon, Tyler; Scott, Stephen L.

In: ACM Transactions on Storage, Vol. 2, No. 3, 01.08.2006, p. 221-254.

Research output: Contribution to journalArticle

Vazhkudai, SS, Ma, X, Freeh, VW, Strickland, JW, Tammineedi, N, Simon, T & Scott, SL 2006, 'Constructing collaborative desktop storage caches for large scientific datasets', ACM Transactions on Storage, vol. 2, no. 3, pp. 221-254. https://doi.org/10.1145/1168910.1168911
Vazhkudai, Sudharshan S. ; Ma, Xiaosong ; Freeh, Vincent W. ; Strickland, Jonathan W. ; Tammineedi, Nandan ; Simon, Tyler ; Scott, Stephen L. / Constructing collaborative desktop storage caches for large scientific datasets. In: ACM Transactions on Storage. 2006 ; Vol. 2, No. 3. pp. 221-254.
@article{72f59950383e4d549edb2e6073fe5dbe,
title = "Constructing collaborative desktop storage caches for large scientific datasets",
abstract = "High-end computing is suffering a data deluge from experiments, simulations, and apparatus that creates overwhelming application dataset sizes. This has led to the proliferation of high-end mass storage systems, storage area clusters, and data centers. These storage facilities offer a large range of choices in terms of capacity and access rate, as well as strong data availability and consistency support. However, for most end-users, the {"}last mile{"} in their analysis pipeline often requires data processing and visualization at local computers, typically local desktop workstations. End-user workstations - despite having more processing power than ever before - are ill-equipped to cope with such data demands due to insufficient secondary storage space and I/O rates. Meanwhile, a large portion of desktop storage is unused. We propose the FreeLoader framework, which aggregates unused desktop storage space and I/O bandwidth into a shared cache/scratch space, for hosting large, immutable datasets and exploiting data access locality. This article presents the FreeLoader architecture, component design, and performance results based on our proof-of-concept prototype. Its architecture comprises contributing benefactor nodes, steered by a management layer, providing services such as data integrity, high performance, load balancing, and impact control. Our experiments show that FreeLoader is an appealing low-cost solution to storing massive datasets by delivering higher data access rates than traditional storage facilities, namely, local or remote shared file systems, storage systems, and Internet data repositories. In particular, we present novel data striping techniques that allow FreeLoader to efficiently aggregate a workstation's network communication bandwidth and local I/O bandwidth. In addition, the performance impact on the native workload of donor machines is small and can be effectively controlled. Further, we show that security features such as data encryptions and integrity checks can be easily added as filters for interested clients. Finally, we demonstrate how legacy applications can use the FreeLoader API to store and retrieve datasets.",
keywords = "Distributed storage, Parallel I/O, Scientific data management, Server-less storage system, Storage cache, Storage networking, Storage resoucce management, Storage scavenging, Striped storage",
author = "Vazhkudai, {Sudharshan S.} and Xiaosong Ma and Freeh, {Vincent W.} and Strickland, {Jonathan W.} and Nandan Tammineedi and Tyler Simon and Scott, {Stephen L.}",
year = "2006",
month = "8",
day = "1",
doi = "10.1145/1168910.1168911",
language = "English",
volume = "2",
pages = "221--254",
journal = "ACM Transactions on Storage",
issn = "1553-3077",
publisher = "Association for Computing Machinery (ACM)",
number = "3",

}

TY - JOUR

T1 - Constructing collaborative desktop storage caches for large scientific datasets

AU - Vazhkudai, Sudharshan S.

AU - Ma, Xiaosong

AU - Freeh, Vincent W.

AU - Strickland, Jonathan W.

AU - Tammineedi, Nandan

AU - Simon, Tyler

AU - Scott, Stephen L.

PY - 2006/8/1

Y1 - 2006/8/1

N2 - High-end computing is suffering a data deluge from experiments, simulations, and apparatus that creates overwhelming application dataset sizes. This has led to the proliferation of high-end mass storage systems, storage area clusters, and data centers. These storage facilities offer a large range of choices in terms of capacity and access rate, as well as strong data availability and consistency support. However, for most end-users, the "last mile" in their analysis pipeline often requires data processing and visualization at local computers, typically local desktop workstations. End-user workstations - despite having more processing power than ever before - are ill-equipped to cope with such data demands due to insufficient secondary storage space and I/O rates. Meanwhile, a large portion of desktop storage is unused. We propose the FreeLoader framework, which aggregates unused desktop storage space and I/O bandwidth into a shared cache/scratch space, for hosting large, immutable datasets and exploiting data access locality. This article presents the FreeLoader architecture, component design, and performance results based on our proof-of-concept prototype. Its architecture comprises contributing benefactor nodes, steered by a management layer, providing services such as data integrity, high performance, load balancing, and impact control. Our experiments show that FreeLoader is an appealing low-cost solution to storing massive datasets by delivering higher data access rates than traditional storage facilities, namely, local or remote shared file systems, storage systems, and Internet data repositories. In particular, we present novel data striping techniques that allow FreeLoader to efficiently aggregate a workstation's network communication bandwidth and local I/O bandwidth. In addition, the performance impact on the native workload of donor machines is small and can be effectively controlled. Further, we show that security features such as data encryptions and integrity checks can be easily added as filters for interested clients. Finally, we demonstrate how legacy applications can use the FreeLoader API to store and retrieve datasets.

AB - High-end computing is suffering a data deluge from experiments, simulations, and apparatus that creates overwhelming application dataset sizes. This has led to the proliferation of high-end mass storage systems, storage area clusters, and data centers. These storage facilities offer a large range of choices in terms of capacity and access rate, as well as strong data availability and consistency support. However, for most end-users, the "last mile" in their analysis pipeline often requires data processing and visualization at local computers, typically local desktop workstations. End-user workstations - despite having more processing power than ever before - are ill-equipped to cope with such data demands due to insufficient secondary storage space and I/O rates. Meanwhile, a large portion of desktop storage is unused. We propose the FreeLoader framework, which aggregates unused desktop storage space and I/O bandwidth into a shared cache/scratch space, for hosting large, immutable datasets and exploiting data access locality. This article presents the FreeLoader architecture, component design, and performance results based on our proof-of-concept prototype. Its architecture comprises contributing benefactor nodes, steered by a management layer, providing services such as data integrity, high performance, load balancing, and impact control. Our experiments show that FreeLoader is an appealing low-cost solution to storing massive datasets by delivering higher data access rates than traditional storage facilities, namely, local or remote shared file systems, storage systems, and Internet data repositories. In particular, we present novel data striping techniques that allow FreeLoader to efficiently aggregate a workstation's network communication bandwidth and local I/O bandwidth. In addition, the performance impact on the native workload of donor machines is small and can be effectively controlled. Further, we show that security features such as data encryptions and integrity checks can be easily added as filters for interested clients. Finally, we demonstrate how legacy applications can use the FreeLoader API to store and retrieve datasets.

KW - Distributed storage

KW - Parallel I/O

KW - Scientific data management

KW - Server-less storage system

KW - Storage cache

KW - Storage networking

KW - Storage resoucce management

KW - Storage scavenging

KW - Striped storage

UR - http://www.scopus.com/inward/record.url?scp=33750485344&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33750485344&partnerID=8YFLogxK

U2 - 10.1145/1168910.1168911

DO - 10.1145/1168910.1168911

M3 - Article

AN - SCOPUS:33750485344

VL - 2

SP - 221

EP - 254

JO - ACM Transactions on Storage

JF - ACM Transactions on Storage

SN - 1553-3077

IS - 3

ER -