QFrag

Distributed graph search via subgraph isomorphism

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

This paper introduces QFrag, a distributed system for graph search on top of bulk synchronous processing (BSP) systems such as MapReduce and Spark. Searching for patterns in graphs is an important and computationally complex problem. Most current distributed search systems scale to graphs that do not fit in main memory by partitioning the input graph. For analytical queries, however, this approach entails running expensive distributed joins on large intermediate data. In this paper we explore an alternative approach: replicating the input graph and running independent parallel instances of a sequential graph search algorithm. In principle, this approach leads us to an embarrassingly parallel problem, since workers can complete their tasks in parallel without coordination. However, the skew present in natural graphs makes this problem a deceitfully parallel one, i.e., an embarrassingly parallel problem with poor load balancing. We therefore introduce a task fragmentation technique that avoids stragglers but at the same time minimizes coordination. Our evaluation shows that QFrag outperforms BSP-based systems by orders of magnitude, and performs similar to asynchronous MPI-based systems on simple queries. Furthermore, it is able to run computationally complex analytical queries that other systems are unable to handle.

Original languageEnglish
Title of host publicationSoCC 2017 - Proceedings of the 2017 Symposium on Cloud Computing
PublisherAssociation for Computing Machinery, Inc
Pages214-228
Number of pages15
ISBN (Electronic)9781450350280
DOIs
Publication statusPublished - 24 Sep 2017
Event2017 Symposium on Cloud Computing, SoCC 2017 - Santa Clara, United States
Duration: 24 Sep 201727 Sep 2017

Other

Other2017 Symposium on Cloud Computing, SoCC 2017
CountryUnited States
CitySanta Clara
Period24/9/1727/9/17

Fingerprint

Graph Search
Subgraph
Isomorphism
Graph in graph theory
Processing
Electric sparks
Resource allocation
Query
Data storage equipment
Graph Algorithms
MapReduce
Fragmentation
Load Balancing
Skew
Join
Search Algorithm
Distributed Systems
Partitioning
Minimise
Alternatives

Keywords

  • Bulk synchronous processing
  • Graph search
  • Load balancing

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Theoretical Computer Science

Cite this

Serafini, M., Morales, G., & Siganos, G. (2017). QFrag: Distributed graph search via subgraph isomorphism. In SoCC 2017 - Proceedings of the 2017 Symposium on Cloud Computing (pp. 214-228). Association for Computing Machinery, Inc. https://doi.org/10.1145/3127479.3131625

QFrag : Distributed graph search via subgraph isomorphism. / Serafini, Marco; Morales, Gianmarco; Siganos, Georgos.

SoCC 2017 - Proceedings of the 2017 Symposium on Cloud Computing. Association for Computing Machinery, Inc, 2017. p. 214-228.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Serafini, M, Morales, G & Siganos, G 2017, QFrag: Distributed graph search via subgraph isomorphism. in SoCC 2017 - Proceedings of the 2017 Symposium on Cloud Computing. Association for Computing Machinery, Inc, pp. 214-228, 2017 Symposium on Cloud Computing, SoCC 2017, Santa Clara, United States, 24/9/17. https://doi.org/10.1145/3127479.3131625
Serafini M, Morales G, Siganos G. QFrag: Distributed graph search via subgraph isomorphism. In SoCC 2017 - Proceedings of the 2017 Symposium on Cloud Computing. Association for Computing Machinery, Inc. 2017. p. 214-228 https://doi.org/10.1145/3127479.3131625
Serafini, Marco ; Morales, Gianmarco ; Siganos, Georgos. / QFrag : Distributed graph search via subgraph isomorphism. SoCC 2017 - Proceedings of the 2017 Symposium on Cloud Computing. Association for Computing Machinery, Inc, 2017. pp. 214-228
@inproceedings{df938ff02a034c189cf3c303b2dabc55,
title = "QFrag: Distributed graph search via subgraph isomorphism",
abstract = "This paper introduces QFrag, a distributed system for graph search on top of bulk synchronous processing (BSP) systems such as MapReduce and Spark. Searching for patterns in graphs is an important and computationally complex problem. Most current distributed search systems scale to graphs that do not fit in main memory by partitioning the input graph. For analytical queries, however, this approach entails running expensive distributed joins on large intermediate data. In this paper we explore an alternative approach: replicating the input graph and running independent parallel instances of a sequential graph search algorithm. In principle, this approach leads us to an embarrassingly parallel problem, since workers can complete their tasks in parallel without coordination. However, the skew present in natural graphs makes this problem a deceitfully parallel one, i.e., an embarrassingly parallel problem with poor load balancing. We therefore introduce a task fragmentation technique that avoids stragglers but at the same time minimizes coordination. Our evaluation shows that QFrag outperforms BSP-based systems by orders of magnitude, and performs similar to asynchronous MPI-based systems on simple queries. Furthermore, it is able to run computationally complex analytical queries that other systems are unable to handle.",
keywords = "Bulk synchronous processing, Graph search, Load balancing",
author = "Marco Serafini and Gianmarco Morales and Georgos Siganos",
year = "2017",
month = "9",
day = "24",
doi = "10.1145/3127479.3131625",
language = "English",
pages = "214--228",
booktitle = "SoCC 2017 - Proceedings of the 2017 Symposium on Cloud Computing",
publisher = "Association for Computing Machinery, Inc",

}

TY - GEN

T1 - QFrag

T2 - Distributed graph search via subgraph isomorphism

AU - Serafini, Marco

AU - Morales, Gianmarco

AU - Siganos, Georgos

PY - 2017/9/24

Y1 - 2017/9/24

N2 - This paper introduces QFrag, a distributed system for graph search on top of bulk synchronous processing (BSP) systems such as MapReduce and Spark. Searching for patterns in graphs is an important and computationally complex problem. Most current distributed search systems scale to graphs that do not fit in main memory by partitioning the input graph. For analytical queries, however, this approach entails running expensive distributed joins on large intermediate data. In this paper we explore an alternative approach: replicating the input graph and running independent parallel instances of a sequential graph search algorithm. In principle, this approach leads us to an embarrassingly parallel problem, since workers can complete their tasks in parallel without coordination. However, the skew present in natural graphs makes this problem a deceitfully parallel one, i.e., an embarrassingly parallel problem with poor load balancing. We therefore introduce a task fragmentation technique that avoids stragglers but at the same time minimizes coordination. Our evaluation shows that QFrag outperforms BSP-based systems by orders of magnitude, and performs similar to asynchronous MPI-based systems on simple queries. Furthermore, it is able to run computationally complex analytical queries that other systems are unable to handle.

AB - This paper introduces QFrag, a distributed system for graph search on top of bulk synchronous processing (BSP) systems such as MapReduce and Spark. Searching for patterns in graphs is an important and computationally complex problem. Most current distributed search systems scale to graphs that do not fit in main memory by partitioning the input graph. For analytical queries, however, this approach entails running expensive distributed joins on large intermediate data. In this paper we explore an alternative approach: replicating the input graph and running independent parallel instances of a sequential graph search algorithm. In principle, this approach leads us to an embarrassingly parallel problem, since workers can complete their tasks in parallel without coordination. However, the skew present in natural graphs makes this problem a deceitfully parallel one, i.e., an embarrassingly parallel problem with poor load balancing. We therefore introduce a task fragmentation technique that avoids stragglers but at the same time minimizes coordination. Our evaluation shows that QFrag outperforms BSP-based systems by orders of magnitude, and performs similar to asynchronous MPI-based systems on simple queries. Furthermore, it is able to run computationally complex analytical queries that other systems are unable to handle.

KW - Bulk synchronous processing

KW - Graph search

KW - Load balancing

UR - http://www.scopus.com/inward/record.url?scp=85032440764&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85032440764&partnerID=8YFLogxK

U2 - 10.1145/3127479.3131625

DO - 10.1145/3127479.3131625

M3 - Conference contribution

SP - 214

EP - 228

BT - SoCC 2017 - Proceedings of the 2017 Symposium on Cloud Computing

PB - Association for Computing Machinery, Inc

ER -