Scalable Dynamic Graph Summarization

Ioanna Tsalouchidou, Francesco Bonchi, Gianmarco Morales, Ricardo Baeza-Yates

Research output: Contribution to journalArticle

Abstract

Large-scale dynamic interaction graphs can be challenging to process and store, due to their size and the continuous change of communication patterns between nodes. In this work we address the problem of summarizing large-scale dynamic graphs, while maintaining the evolution of their structure and the communication patterns. Our approach is based on grouping the nodes of the graph in supernodes according to their connectivity and communication patterns. The resulting summary graph preserves the information about the evolution of the graph within a time window. We propose two online algorithms for summarizing this type of graphs. Our baseline algorithm kC based on clustering is fast but rather memory expensive. The second method we propose, named μC, reduces the memory requirements by introducing an intermediate step that keeps statistics of the clustering of the previous rounds. Our algorithms are distributed by design, and we implement them over the Apache Spark framework, so as to address the problem of scalability for large-scale graphs and massive streams. We apply our methods to several dynamic graphs showing that we can efficiently use the summary graphs to answer temporal and probabilistic graph queries.

Original languageEnglish
JournalIEEE Transactions on Knowledge and Data Engineering
DOIs
Publication statusAccepted/In press - 1 Jan 2018

Fingerprint

Communication
Data storage equipment
Electric sparks
Parallel algorithms
Scalability
Statistics

Keywords

  • Clustering algorithms
  • Heuristic algorithms
  • Microsoft Windows
  • Scalability
  • Silicon
  • Tensile stress

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

Scalable Dynamic Graph Summarization. / Tsalouchidou, Ioanna; Bonchi, Francesco; Morales, Gianmarco; Baeza-Yates, Ricardo.

In: IEEE Transactions on Knowledge and Data Engineering, 01.01.2018.

Research output: Contribution to journalArticle

Tsalouchidou, Ioanna ; Bonchi, Francesco ; Morales, Gianmarco ; Baeza-Yates, Ricardo. / Scalable Dynamic Graph Summarization. In: IEEE Transactions on Knowledge and Data Engineering. 2018.
@article{468e2a35da8843e382126503efcfd346,
title = "Scalable Dynamic Graph Summarization",
abstract = "Large-scale dynamic interaction graphs can be challenging to process and store, due to their size and the continuous change of communication patterns between nodes. In this work we address the problem of summarizing large-scale dynamic graphs, while maintaining the evolution of their structure and the communication patterns. Our approach is based on grouping the nodes of the graph in supernodes according to their connectivity and communication patterns. The resulting summary graph preserves the information about the evolution of the graph within a time window. We propose two online algorithms for summarizing this type of graphs. Our baseline algorithm kC based on clustering is fast but rather memory expensive. The second method we propose, named μC, reduces the memory requirements by introducing an intermediate step that keeps statistics of the clustering of the previous rounds. Our algorithms are distributed by design, and we implement them over the Apache Spark framework, so as to address the problem of scalability for large-scale graphs and massive streams. We apply our methods to several dynamic graphs showing that we can efficiently use the summary graphs to answer temporal and probabilistic graph queries.",
keywords = "Clustering algorithms, Heuristic algorithms, Microsoft Windows, Scalability, Silicon, Tensile stress",
author = "Ioanna Tsalouchidou and Francesco Bonchi and Gianmarco Morales and Ricardo Baeza-Yates",
year = "2018",
month = "1",
day = "1",
doi = "10.1109/TKDE.2018.2884471",
language = "English",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - Scalable Dynamic Graph Summarization

AU - Tsalouchidou, Ioanna

AU - Bonchi, Francesco

AU - Morales, Gianmarco

AU - Baeza-Yates, Ricardo

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Large-scale dynamic interaction graphs can be challenging to process and store, due to their size and the continuous change of communication patterns between nodes. In this work we address the problem of summarizing large-scale dynamic graphs, while maintaining the evolution of their structure and the communication patterns. Our approach is based on grouping the nodes of the graph in supernodes according to their connectivity and communication patterns. The resulting summary graph preserves the information about the evolution of the graph within a time window. We propose two online algorithms for summarizing this type of graphs. Our baseline algorithm kC based on clustering is fast but rather memory expensive. The second method we propose, named μC, reduces the memory requirements by introducing an intermediate step that keeps statistics of the clustering of the previous rounds. Our algorithms are distributed by design, and we implement them over the Apache Spark framework, so as to address the problem of scalability for large-scale graphs and massive streams. We apply our methods to several dynamic graphs showing that we can efficiently use the summary graphs to answer temporal and probabilistic graph queries.

AB - Large-scale dynamic interaction graphs can be challenging to process and store, due to their size and the continuous change of communication patterns between nodes. In this work we address the problem of summarizing large-scale dynamic graphs, while maintaining the evolution of their structure and the communication patterns. Our approach is based on grouping the nodes of the graph in supernodes according to their connectivity and communication patterns. The resulting summary graph preserves the information about the evolution of the graph within a time window. We propose two online algorithms for summarizing this type of graphs. Our baseline algorithm kC based on clustering is fast but rather memory expensive. The second method we propose, named μC, reduces the memory requirements by introducing an intermediate step that keeps statistics of the clustering of the previous rounds. Our algorithms are distributed by design, and we implement them over the Apache Spark framework, so as to address the problem of scalability for large-scale graphs and massive streams. We apply our methods to several dynamic graphs showing that we can efficiently use the summary graphs to answer temporal and probabilistic graph queries.

KW - Clustering algorithms

KW - Heuristic algorithms

KW - Microsoft Windows

KW - Scalability

KW - Silicon

KW - Tensile stress

UR - http://www.scopus.com/inward/record.url?scp=85057895648&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85057895648&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2018.2884471

DO - 10.1109/TKDE.2018.2884471

M3 - Article

AN - SCOPUS:85057895648

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

ER -