Arabesque: A system for distributed graph mining

Carlos H C Teixeira, Alexandre J. Fonseca, Marco Serafini, Georgos Siganos, Mohammed J. Zaki, Ashraf Aboulnaga

Research output: Chapter in Book/Report/Conference proceedingConference contribution

51 Citations (Scopus)

Abstract

Distributed data processing platforms such as MapReduce and Pregel have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms do not represent a good match for distributed graph mining problems, as for example finding frequent subgraphs in a graph. Given an input graph, these problems require exploring a very large number of subgraphs and finding patterns that match some "interestingness" criteria desired by the user. These algorithms are very important for areas such as social networks, semantic web, and bioinformatics. In this paper, we present Arabesque, the first distributed data processing platform for implementing graph mining algorithms. Arabesque automates the process of exploring a very large number of subgraphs. It defines a high-level filter-process computational model that simplifies the development of scalable graph mining algorithms: Arabesque explores subgraphs and passes them to the application, which must simply compute outputs and decide whether the subgraph should be further extended. We use Arabesque's API to produce distributed solutions to three fundamental graph mining problems: frequent subgraph mining, counting motifs, and finding cliques. Our implementations require a handful of lines of code, scale to trillions of subgraphs, and represent in some cases the first available distributed solutions.

Original languageEnglish
Title of host publicationSOSP 2015 - Proceedings of the 25th ACM Symposium on Operating Systems Principles
PublisherAssociation for Computing Machinery, Inc
Pages425-440
Number of pages16
ISBN (Print)9781450338349
DOIs
Publication statusPublished - 4 Oct 2015
Event25th ACM Symposium on Operating Systems Principles, SOSP 2015 - Monterey, United States
Duration: 5 Oct 20157 Oct 2015

Other

Other25th ACM Symposium on Operating Systems Principles, SOSP 2015
CountryUnited States
CityMonterey
Period5/10/157/10/15

Fingerprint

Bioinformatics
Semantic Web
Application programming interfaces (API)

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software
  • Electrical and Electronic Engineering
  • Computer Science Applications

Cite this

Teixeira, C. H. C., Fonseca, A. J., Serafini, M., Siganos, G., Zaki, M. J., & Aboulnaga, A. (2015). Arabesque: A system for distributed graph mining. In SOSP 2015 - Proceedings of the 25th ACM Symposium on Operating Systems Principles (pp. 425-440). Association for Computing Machinery, Inc. https://doi.org/10.1145/2815400.2815410

Arabesque : A system for distributed graph mining. / Teixeira, Carlos H C; Fonseca, Alexandre J.; Serafini, Marco; Siganos, Georgos; Zaki, Mohammed J.; Aboulnaga, Ashraf.

SOSP 2015 - Proceedings of the 25th ACM Symposium on Operating Systems Principles. Association for Computing Machinery, Inc, 2015. p. 425-440.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Teixeira, CHC, Fonseca, AJ, Serafini, M, Siganos, G, Zaki, MJ & Aboulnaga, A 2015, Arabesque: A system for distributed graph mining. in SOSP 2015 - Proceedings of the 25th ACM Symposium on Operating Systems Principles. Association for Computing Machinery, Inc, pp. 425-440, 25th ACM Symposium on Operating Systems Principles, SOSP 2015, Monterey, United States, 5/10/15. https://doi.org/10.1145/2815400.2815410
Teixeira CHC, Fonseca AJ, Serafini M, Siganos G, Zaki MJ, Aboulnaga A. Arabesque: A system for distributed graph mining. In SOSP 2015 - Proceedings of the 25th ACM Symposium on Operating Systems Principles. Association for Computing Machinery, Inc. 2015. p. 425-440 https://doi.org/10.1145/2815400.2815410
Teixeira, Carlos H C ; Fonseca, Alexandre J. ; Serafini, Marco ; Siganos, Georgos ; Zaki, Mohammed J. ; Aboulnaga, Ashraf. / Arabesque : A system for distributed graph mining. SOSP 2015 - Proceedings of the 25th ACM Symposium on Operating Systems Principles. Association for Computing Machinery, Inc, 2015. pp. 425-440
@inproceedings{a914c336785443519d8438ac9969700b,
title = "Arabesque: A system for distributed graph mining",
abstract = "Distributed data processing platforms such as MapReduce and Pregel have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms do not represent a good match for distributed graph mining problems, as for example finding frequent subgraphs in a graph. Given an input graph, these problems require exploring a very large number of subgraphs and finding patterns that match some {"}interestingness{"} criteria desired by the user. These algorithms are very important for areas such as social networks, semantic web, and bioinformatics. In this paper, we present Arabesque, the first distributed data processing platform for implementing graph mining algorithms. Arabesque automates the process of exploring a very large number of subgraphs. It defines a high-level filter-process computational model that simplifies the development of scalable graph mining algorithms: Arabesque explores subgraphs and passes them to the application, which must simply compute outputs and decide whether the subgraph should be further extended. We use Arabesque's API to produce distributed solutions to three fundamental graph mining problems: frequent subgraph mining, counting motifs, and finding cliques. Our implementations require a handful of lines of code, scale to trillions of subgraphs, and represent in some cases the first available distributed solutions.",
author = "Teixeira, {Carlos H C} and Fonseca, {Alexandre J.} and Marco Serafini and Georgos Siganos and Zaki, {Mohammed J.} and Ashraf Aboulnaga",
year = "2015",
month = "10",
day = "4",
doi = "10.1145/2815400.2815410",
language = "English",
isbn = "9781450338349",
pages = "425--440",
booktitle = "SOSP 2015 - Proceedings of the 25th ACM Symposium on Operating Systems Principles",
publisher = "Association for Computing Machinery, Inc",

}

TY - GEN

T1 - Arabesque

T2 - A system for distributed graph mining

AU - Teixeira, Carlos H C

AU - Fonseca, Alexandre J.

AU - Serafini, Marco

AU - Siganos, Georgos

AU - Zaki, Mohammed J.

AU - Aboulnaga, Ashraf

PY - 2015/10/4

Y1 - 2015/10/4

N2 - Distributed data processing platforms such as MapReduce and Pregel have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms do not represent a good match for distributed graph mining problems, as for example finding frequent subgraphs in a graph. Given an input graph, these problems require exploring a very large number of subgraphs and finding patterns that match some "interestingness" criteria desired by the user. These algorithms are very important for areas such as social networks, semantic web, and bioinformatics. In this paper, we present Arabesque, the first distributed data processing platform for implementing graph mining algorithms. Arabesque automates the process of exploring a very large number of subgraphs. It defines a high-level filter-process computational model that simplifies the development of scalable graph mining algorithms: Arabesque explores subgraphs and passes them to the application, which must simply compute outputs and decide whether the subgraph should be further extended. We use Arabesque's API to produce distributed solutions to three fundamental graph mining problems: frequent subgraph mining, counting motifs, and finding cliques. Our implementations require a handful of lines of code, scale to trillions of subgraphs, and represent in some cases the first available distributed solutions.

AB - Distributed data processing platforms such as MapReduce and Pregel have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms do not represent a good match for distributed graph mining problems, as for example finding frequent subgraphs in a graph. Given an input graph, these problems require exploring a very large number of subgraphs and finding patterns that match some "interestingness" criteria desired by the user. These algorithms are very important for areas such as social networks, semantic web, and bioinformatics. In this paper, we present Arabesque, the first distributed data processing platform for implementing graph mining algorithms. Arabesque automates the process of exploring a very large number of subgraphs. It defines a high-level filter-process computational model that simplifies the development of scalable graph mining algorithms: Arabesque explores subgraphs and passes them to the application, which must simply compute outputs and decide whether the subgraph should be further extended. We use Arabesque's API to produce distributed solutions to three fundamental graph mining problems: frequent subgraph mining, counting motifs, and finding cliques. Our implementations require a handful of lines of code, scale to trillions of subgraphs, and represent in some cases the first available distributed solutions.

UR - http://www.scopus.com/inward/record.url?scp=84957957114&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84957957114&partnerID=8YFLogxK

U2 - 10.1145/2815400.2815410

DO - 10.1145/2815400.2815410

M3 - Conference contribution

AN - SCOPUS:84957957114

SN - 9781450338349

SP - 425

EP - 440

BT - SOSP 2015 - Proceedings of the 25th ACM Symposium on Operating Systems Principles

PB - Association for Computing Machinery, Inc

ER -