Graph stream summarization

From big bang to big crunch

Nan Tang, Qing Chen, Prasenjit Mitra

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

A graph stream, which refers to the graph with edges being updated sequentially in a form of a stream, has important applications in cyber security and social networks. Due to the sheer volume and highly dynamic nature of graph streams, the practical way of handling them is by summarization. Given a graph stream G, directed or undirected, the problem of graph stream summarization is to summarize G as SG with a much smaller (sublinear) space, linear construction time and constant maintenance cost for each edge update, such that SG allows many queries over G to be approximately conducted efficiently. The widely used practice of summarizing data streams is to treat each stream element independently by e.g., hash-or sample-based methods, without maintaining the connections (or relationships) between elements. Hence, existing methods can only solve ad-hoc problems, without supporting diversified and complicated analytics over graph streams. We present TCM, a novel graph stream summary. Given an incoming edge, it summarizes both node and edge information in constant time. Consequently, the summary forms a graphical sketch where edges capture the connections inside elements, and nodes maintain relationships across elements. We discuss a wide range of supported queries and establish some error bounds. In addition, we experimentally show that TCM can effectively and efficiently support analytics over graph streams beyond the power of existing sketches, which demonstrates its potential to start a new line of research and applications in graph stream management.

Original languageEnglish
Title of host publicationSIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages1481-1496
Number of pages16
Volume26-June-2016
ISBN (Electronic)9781450335317
DOIs
Publication statusPublished - 26 Jun 2016
Event2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016 - San Francisco, United States
Duration: 26 Jun 20161 Jul 2016

Other

Other2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016
CountryUnited States
CitySan Francisco
Period26/6/161/7/16

Fingerprint

Costs

Keywords

  • Data streams
  • Graph streams
  • Sketch
  • Summarization

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Tang, N., Chen, Q., & Mitra, P. (2016). Graph stream summarization: From big bang to big crunch. In SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data (Vol. 26-June-2016, pp. 1481-1496). Association for Computing Machinery. https://doi.org/10.1145/2882903.2915223

Graph stream summarization : From big bang to big crunch. / Tang, Nan; Chen, Qing; Mitra, Prasenjit.

SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data. Vol. 26-June-2016 Association for Computing Machinery, 2016. p. 1481-1496.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Tang, N, Chen, Q & Mitra, P 2016, Graph stream summarization: From big bang to big crunch. in SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data. vol. 26-June-2016, Association for Computing Machinery, pp. 1481-1496, 2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016, San Francisco, United States, 26/6/16. https://doi.org/10.1145/2882903.2915223
Tang N, Chen Q, Mitra P. Graph stream summarization: From big bang to big crunch. In SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data. Vol. 26-June-2016. Association for Computing Machinery. 2016. p. 1481-1496 https://doi.org/10.1145/2882903.2915223
Tang, Nan ; Chen, Qing ; Mitra, Prasenjit. / Graph stream summarization : From big bang to big crunch. SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data. Vol. 26-June-2016 Association for Computing Machinery, 2016. pp. 1481-1496
@inproceedings{00feb8ac826346c781dda6bea9488124,
title = "Graph stream summarization: From big bang to big crunch",
abstract = "A graph stream, which refers to the graph with edges being updated sequentially in a form of a stream, has important applications in cyber security and social networks. Due to the sheer volume and highly dynamic nature of graph streams, the practical way of handling them is by summarization. Given a graph stream G, directed or undirected, the problem of graph stream summarization is to summarize G as SG with a much smaller (sublinear) space, linear construction time and constant maintenance cost for each edge update, such that SG allows many queries over G to be approximately conducted efficiently. The widely used practice of summarizing data streams is to treat each stream element independently by e.g., hash-or sample-based methods, without maintaining the connections (or relationships) between elements. Hence, existing methods can only solve ad-hoc problems, without supporting diversified and complicated analytics over graph streams. We present TCM, a novel graph stream summary. Given an incoming edge, it summarizes both node and edge information in constant time. Consequently, the summary forms a graphical sketch where edges capture the connections inside elements, and nodes maintain relationships across elements. We discuss a wide range of supported queries and establish some error bounds. In addition, we experimentally show that TCM can effectively and efficiently support analytics over graph streams beyond the power of existing sketches, which demonstrates its potential to start a new line of research and applications in graph stream management.",
keywords = "Data streams, Graph streams, Sketch, Summarization",
author = "Nan Tang and Qing Chen and Prasenjit Mitra",
year = "2016",
month = "6",
day = "26",
doi = "10.1145/2882903.2915223",
language = "English",
volume = "26-June-2016",
pages = "1481--1496",
booktitle = "SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - Graph stream summarization

T2 - From big bang to big crunch

AU - Tang, Nan

AU - Chen, Qing

AU - Mitra, Prasenjit

PY - 2016/6/26

Y1 - 2016/6/26

N2 - A graph stream, which refers to the graph with edges being updated sequentially in a form of a stream, has important applications in cyber security and social networks. Due to the sheer volume and highly dynamic nature of graph streams, the practical way of handling them is by summarization. Given a graph stream G, directed or undirected, the problem of graph stream summarization is to summarize G as SG with a much smaller (sublinear) space, linear construction time and constant maintenance cost for each edge update, such that SG allows many queries over G to be approximately conducted efficiently. The widely used practice of summarizing data streams is to treat each stream element independently by e.g., hash-or sample-based methods, without maintaining the connections (or relationships) between elements. Hence, existing methods can only solve ad-hoc problems, without supporting diversified and complicated analytics over graph streams. We present TCM, a novel graph stream summary. Given an incoming edge, it summarizes both node and edge information in constant time. Consequently, the summary forms a graphical sketch where edges capture the connections inside elements, and nodes maintain relationships across elements. We discuss a wide range of supported queries and establish some error bounds. In addition, we experimentally show that TCM can effectively and efficiently support analytics over graph streams beyond the power of existing sketches, which demonstrates its potential to start a new line of research and applications in graph stream management.

AB - A graph stream, which refers to the graph with edges being updated sequentially in a form of a stream, has important applications in cyber security and social networks. Due to the sheer volume and highly dynamic nature of graph streams, the practical way of handling them is by summarization. Given a graph stream G, directed or undirected, the problem of graph stream summarization is to summarize G as SG with a much smaller (sublinear) space, linear construction time and constant maintenance cost for each edge update, such that SG allows many queries over G to be approximately conducted efficiently. The widely used practice of summarizing data streams is to treat each stream element independently by e.g., hash-or sample-based methods, without maintaining the connections (or relationships) between elements. Hence, existing methods can only solve ad-hoc problems, without supporting diversified and complicated analytics over graph streams. We present TCM, a novel graph stream summary. Given an incoming edge, it summarizes both node and edge information in constant time. Consequently, the summary forms a graphical sketch where edges capture the connections inside elements, and nodes maintain relationships across elements. We discuss a wide range of supported queries and establish some error bounds. In addition, we experimentally show that TCM can effectively and efficiently support analytics over graph streams beyond the power of existing sketches, which demonstrates its potential to start a new line of research and applications in graph stream management.

KW - Data streams

KW - Graph streams

KW - Sketch

KW - Summarization

UR - http://www.scopus.com/inward/record.url?scp=84979656018&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84979656018&partnerID=8YFLogxK

U2 - 10.1145/2882903.2915223

DO - 10.1145/2882903.2915223

M3 - Conference contribution

VL - 26-June-2016

SP - 1481

EP - 1496

BT - SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data

PB - Association for Computing Machinery

ER -