Scalable Dynamic Graph Summarization

Ioanna Tsalouchidou, Francesco Bonchi, Gianmarco Morales, Ricardo Baeza-Yates

Research output: Contribution to journalArticle

Abstract

Large-scale dynamic interaction graphs can be challenging to process and store, due to their size and the continuous change of communication patterns between nodes. In this work we address the problem of summarizing large-scale dynamic graphs, while maintaining the evolution of their structure and the communication patterns. Our approach is based on grouping the nodes of the graph in supernodes according to their connectivity and communication patterns. The resulting summary graph preserves the information about the evolution of the graph within a time window. We propose two online algorithms for summarizing this type of graphs. Our baseline algorithm kC based on clustering is fast but rather memory expensive. The second method we propose, named μC, reduces the memory requirements by introducing an intermediate step that keeps statistics of the clustering of the previous rounds. Our algorithms are distributed by design, and we implement them over the Apache Spark framework, so as to address the problem of scalability for large-scale graphs and massive streams. We apply our methods to several dynamic graphs showing that we can efficiently use the summary graphs to answer temporal and probabilistic graph queries.

Original languageEnglish
JournalIEEE Transactions on Knowledge and Data Engineering
DOIs
Publication statusAccepted/In press - 1 Jan 2018

    Fingerprint

Keywords

  • Clustering algorithms
  • Heuristic algorithms
  • Microsoft Windows
  • Scalability
  • Silicon
  • Tensile stress

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this