Pantheon

Exascale file system search for scientific computing

Joseph L. Naps, Mohamed Mokbel, David H C Du

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Modern scientific computing generates petabytes of data in billions of files that must be managed. These files are often organized, by name, in a hierarchical directory tree common to most file systems. As the scale of data has increased, this has proven to be a poor method of file organization. Recent tools have allowed for users to navigate files based on file metadata attributes to provide more meaningful organization. In order to search this metadata, it is often stored on separate metadata servers. This solution has drawbacks though due to the multi-tiered architecture of many large scale storage solutions. As data is moved between various tiers of storage and/or modified, the overhead incurred for maintaining consistency between these tiers and the metadata server becomes very large. As scientific systems continue to push towards exascale, this problem will become more pronounced. A simpler option is to bypass the overhead of the metadata server and use the metadata storage inherent to the file system. This approach currently has few tools to perform operations at a large scale though. This paper introduces the prototype for Pantheon, a file system search tool designed to use the metadata storage within the file system itself, bypassing the overhead from metadata servers. Pantheon is also designed with the scientific community's push towards exascale computing in mind. Pantheon combines hierarchical partitioning, query optimization, and indexing to perform efficient metadata searches over large scale file systems.

Original languageEnglish
Title of host publicationScientific and Statistical Database Management - 23rd International Conference, SSDBM 2011, Proceedings
Pages461-469
Number of pages9
DOIs
Publication statusPublished - 11 Aug 2011
Externally publishedYes
Event23rd International Conference on Scientific and Statistical Database Management, SSDBM 2011 - Portland, OR, United States
Duration: 20 Jul 201122 Jul 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6809 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other23rd International Conference on Scientific and Statistical Database Management, SSDBM 2011
CountryUnited States
CityPortland, OR
Period20/7/1122/7/11

Fingerprint

Natural sciences computing
Scientific Computing
File System
Metadata
Computer systems
Servers
Server
File organization
Query Optimization
Large-scale Systems
Indexing
Partitioning
Continue
Attribute
Prototype

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Naps, J. L., Mokbel, M., & Du, D. H. C. (2011). Pantheon: Exascale file system search for scientific computing. In Scientific and Statistical Database Management - 23rd International Conference, SSDBM 2011, Proceedings (pp. 461-469). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6809 LNCS). https://doi.org/10.1007/978-3-642-22351-8_29

Pantheon : Exascale file system search for scientific computing. / Naps, Joseph L.; Mokbel, Mohamed; Du, David H C.

Scientific and Statistical Database Management - 23rd International Conference, SSDBM 2011, Proceedings. 2011. p. 461-469 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6809 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Naps, JL, Mokbel, M & Du, DHC 2011, Pantheon: Exascale file system search for scientific computing. in Scientific and Statistical Database Management - 23rd International Conference, SSDBM 2011, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6809 LNCS, pp. 461-469, 23rd International Conference on Scientific and Statistical Database Management, SSDBM 2011, Portland, OR, United States, 20/7/11. https://doi.org/10.1007/978-3-642-22351-8_29
Naps JL, Mokbel M, Du DHC. Pantheon: Exascale file system search for scientific computing. In Scientific and Statistical Database Management - 23rd International Conference, SSDBM 2011, Proceedings. 2011. p. 461-469. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-22351-8_29
Naps, Joseph L. ; Mokbel, Mohamed ; Du, David H C. / Pantheon : Exascale file system search for scientific computing. Scientific and Statistical Database Management - 23rd International Conference, SSDBM 2011, Proceedings. 2011. pp. 461-469 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{1c8b3ff74f3647f9975ed85cc9d3a1ca,
title = "Pantheon: Exascale file system search for scientific computing",
abstract = "Modern scientific computing generates petabytes of data in billions of files that must be managed. These files are often organized, by name, in a hierarchical directory tree common to most file systems. As the scale of data has increased, this has proven to be a poor method of file organization. Recent tools have allowed for users to navigate files based on file metadata attributes to provide more meaningful organization. In order to search this metadata, it is often stored on separate metadata servers. This solution has drawbacks though due to the multi-tiered architecture of many large scale storage solutions. As data is moved between various tiers of storage and/or modified, the overhead incurred for maintaining consistency between these tiers and the metadata server becomes very large. As scientific systems continue to push towards exascale, this problem will become more pronounced. A simpler option is to bypass the overhead of the metadata server and use the metadata storage inherent to the file system. This approach currently has few tools to perform operations at a large scale though. This paper introduces the prototype for Pantheon, a file system search tool designed to use the metadata storage within the file system itself, bypassing the overhead from metadata servers. Pantheon is also designed with the scientific community's push towards exascale computing in mind. Pantheon combines hierarchical partitioning, query optimization, and indexing to perform efficient metadata searches over large scale file systems.",
author = "Naps, {Joseph L.} and Mohamed Mokbel and Du, {David H C}",
year = "2011",
month = "8",
day = "11",
doi = "10.1007/978-3-642-22351-8_29",
language = "English",
isbn = "9783642223501",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "461--469",
booktitle = "Scientific and Statistical Database Management - 23rd International Conference, SSDBM 2011, Proceedings",

}

TY - GEN

T1 - Pantheon

T2 - Exascale file system search for scientific computing

AU - Naps, Joseph L.

AU - Mokbel, Mohamed

AU - Du, David H C

PY - 2011/8/11

Y1 - 2011/8/11

N2 - Modern scientific computing generates petabytes of data in billions of files that must be managed. These files are often organized, by name, in a hierarchical directory tree common to most file systems. As the scale of data has increased, this has proven to be a poor method of file organization. Recent tools have allowed for users to navigate files based on file metadata attributes to provide more meaningful organization. In order to search this metadata, it is often stored on separate metadata servers. This solution has drawbacks though due to the multi-tiered architecture of many large scale storage solutions. As data is moved between various tiers of storage and/or modified, the overhead incurred for maintaining consistency between these tiers and the metadata server becomes very large. As scientific systems continue to push towards exascale, this problem will become more pronounced. A simpler option is to bypass the overhead of the metadata server and use the metadata storage inherent to the file system. This approach currently has few tools to perform operations at a large scale though. This paper introduces the prototype for Pantheon, a file system search tool designed to use the metadata storage within the file system itself, bypassing the overhead from metadata servers. Pantheon is also designed with the scientific community's push towards exascale computing in mind. Pantheon combines hierarchical partitioning, query optimization, and indexing to perform efficient metadata searches over large scale file systems.

AB - Modern scientific computing generates petabytes of data in billions of files that must be managed. These files are often organized, by name, in a hierarchical directory tree common to most file systems. As the scale of data has increased, this has proven to be a poor method of file organization. Recent tools have allowed for users to navigate files based on file metadata attributes to provide more meaningful organization. In order to search this metadata, it is often stored on separate metadata servers. This solution has drawbacks though due to the multi-tiered architecture of many large scale storage solutions. As data is moved between various tiers of storage and/or modified, the overhead incurred for maintaining consistency between these tiers and the metadata server becomes very large. As scientific systems continue to push towards exascale, this problem will become more pronounced. A simpler option is to bypass the overhead of the metadata server and use the metadata storage inherent to the file system. This approach currently has few tools to perform operations at a large scale though. This paper introduces the prototype for Pantheon, a file system search tool designed to use the metadata storage within the file system itself, bypassing the overhead from metadata servers. Pantheon is also designed with the scientific community's push towards exascale computing in mind. Pantheon combines hierarchical partitioning, query optimization, and indexing to perform efficient metadata searches over large scale file systems.

UR - http://www.scopus.com/inward/record.url?scp=79961182850&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79961182850&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-22351-8_29

DO - 10.1007/978-3-642-22351-8_29

M3 - Conference contribution

SN - 9783642223501

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 461

EP - 469

BT - Scientific and Statistical Database Management - 23rd International Conference, SSDBM 2011, Proceedings

ER -