Abstract
Distributed data processing platforms such as MapReduce and Pregel have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms do not represent a good match for distributed graph mining problems, as for example finding frequent subgraphs in a graph. Given an input graph, these problems require exploring a very large number of subgraphs and finding patterns that match some "interestingness" criteria desired by the user. These algorithms are very important for areas such as social networks, semantic web, and bioinformatics. In this paper, we present Arabesque, the first distributed data processing platform for implementing graph mining algorithms. Arabesque automates the process of exploring a very large number of subgraphs. It defines a high-level filter-process computational model that simplifies the development of scalable graph mining algorithms: Arabesque explores subgraphs and passes them to the application, which must simply compute outputs and decide whether the subgraph should be further extended. We use Arabesque's API to produce distributed solutions to three fundamental graph mining problems: frequent subgraph mining, counting motifs, and finding cliques. Our implementations require a handful of lines of code, scale to trillions of subgraphs, and represent in some cases the first available distributed solutions.
Original language | English |
---|---|
Title of host publication | SOSP 2015 - Proceedings of the 25th ACM Symposium on Operating Systems Principles |
Publisher | Association for Computing Machinery, Inc |
Pages | 425-440 |
Number of pages | 16 |
ISBN (Print) | 9781450338349 |
DOIs | |
Publication status | Published - 4 Oct 2015 |
Event | 25th ACM Symposium on Operating Systems Principles, SOSP 2015 - Monterey, United States Duration: 5 Oct 2015 → 7 Oct 2015 |
Other
Other | 25th ACM Symposium on Operating Systems Principles, SOSP 2015 |
---|---|
Country | United States |
City | Monterey |
Period | 5/10/15 → 7/10/15 |
Fingerprint
ASJC Scopus subject areas
- Hardware and Architecture
- Software
- Electrical and Electronic Engineering
- Computer Science Applications
Cite this
Arabesque : A system for distributed graph mining. / Teixeira, Carlos H C; Fonseca, Alexandre J.; Serafini, Marco; Siganos, Georgos; Zaki, Mohammed J.; Aboulnaga, Ashraf.
SOSP 2015 - Proceedings of the 25th ACM Symposium on Operating Systems Principles. Association for Computing Machinery, Inc, 2015. p. 425-440.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - Arabesque
T2 - A system for distributed graph mining
AU - Teixeira, Carlos H C
AU - Fonseca, Alexandre J.
AU - Serafini, Marco
AU - Siganos, Georgos
AU - Zaki, Mohammed J.
AU - Aboulnaga, Ashraf
PY - 2015/10/4
Y1 - 2015/10/4
N2 - Distributed data processing platforms such as MapReduce and Pregel have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms do not represent a good match for distributed graph mining problems, as for example finding frequent subgraphs in a graph. Given an input graph, these problems require exploring a very large number of subgraphs and finding patterns that match some "interestingness" criteria desired by the user. These algorithms are very important for areas such as social networks, semantic web, and bioinformatics. In this paper, we present Arabesque, the first distributed data processing platform for implementing graph mining algorithms. Arabesque automates the process of exploring a very large number of subgraphs. It defines a high-level filter-process computational model that simplifies the development of scalable graph mining algorithms: Arabesque explores subgraphs and passes them to the application, which must simply compute outputs and decide whether the subgraph should be further extended. We use Arabesque's API to produce distributed solutions to three fundamental graph mining problems: frequent subgraph mining, counting motifs, and finding cliques. Our implementations require a handful of lines of code, scale to trillions of subgraphs, and represent in some cases the first available distributed solutions.
AB - Distributed data processing platforms such as MapReduce and Pregel have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms do not represent a good match for distributed graph mining problems, as for example finding frequent subgraphs in a graph. Given an input graph, these problems require exploring a very large number of subgraphs and finding patterns that match some "interestingness" criteria desired by the user. These algorithms are very important for areas such as social networks, semantic web, and bioinformatics. In this paper, we present Arabesque, the first distributed data processing platform for implementing graph mining algorithms. Arabesque automates the process of exploring a very large number of subgraphs. It defines a high-level filter-process computational model that simplifies the development of scalable graph mining algorithms: Arabesque explores subgraphs and passes them to the application, which must simply compute outputs and decide whether the subgraph should be further extended. We use Arabesque's API to produce distributed solutions to three fundamental graph mining problems: frequent subgraph mining, counting motifs, and finding cliques. Our implementations require a handful of lines of code, scale to trillions of subgraphs, and represent in some cases the first available distributed solutions.
UR - http://www.scopus.com/inward/record.url?scp=84957957114&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84957957114&partnerID=8YFLogxK
U2 - 10.1145/2815400.2815410
DO - 10.1145/2815400.2815410
M3 - Conference contribution
AN - SCOPUS:84957957114
SN - 9781450338349
SP - 425
EP - 440
BT - SOSP 2015 - Proceedings of the 25th ACM Symposium on Operating Systems Principles
PB - Association for Computing Machinery, Inc
ER -