Pagrol: Parallel graph olap over large-scale attributed graphs

Zhengkui Wang, Qi Fan, Huiju Wang, Kian Lee Tan, Divyakant Agrawal, Amr El Abbadi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

34 Citations (Scopus)

Abstract

Attributed graphs are becoming important tools for modeling information networks, such as the Web and various social networks (e.g. Facebook, LinkedIn, Twitter). However, it is computationally challenging to manage and analyze attributed graphs to support effective decision making. In this paper, we propose, Pagrol, a parallel graph OLAP (Online Analytical Processing) system over attributed graphs. In particular, Pagrol introduces a new conceptual Hyper Graph Cube model (which is an attributed-graph analogue of the data cube model for relational DBMS) to aggregate attributed graphs at different granularities and levels. The proposed model supports different queries as well as a new set of graph OLAP Roll-Up/Drill-Down operations. Furthermore, on the basis of Hyper Graph Cube, Pagrol provides an efficient MapReduce-based parallel graph cubing algorithm, MRGraph-Cubing, to compute the graph cube for an attributed graph. Pagrol employs numerous optimization techniques: (a) a self-contained join strategy to minimize I/O cost; (b) a scheme that groups cuboids into batches so as to minimize redundant computations; (c) a cost-based scheme to allocate the batches into bags (each with a small number of batches); and (d) an efficient scheme to process a bag using a single MapReduce job. Results of extensive experimental studies using both real Facebook and synthetic datasets on a 128-node cluster show that Pagrol is effective, efficient and scalable.

Original languageEnglish
Title of host publicationProceedings - International Conference on Data Engineering
PublisherIEEE Computer Society
Pages496-507
Number of pages12
ISBN (Print)9781479925544
DOIs
Publication statusPublished - 1 Jan 2014
Externally publishedYes
Event30th IEEE International Conference on Data Engineering, ICDE 2014 - Chicago, IL, United States
Duration: 31 Mar 20144 Apr 2014

Other

Other30th IEEE International Conference on Data Engineering, ICDE 2014
CountryUnited States
CityChicago, IL
Period31/3/144/4/14

Fingerprint

Processing
Costs
Decision making

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing
  • Software

Cite this

Wang, Z., Fan, Q., Wang, H., Tan, K. L., Agrawal, D., & El Abbadi, A. (2014). Pagrol: Parallel graph olap over large-scale attributed graphs. In Proceedings - International Conference on Data Engineering (pp. 496-507). [6816676] IEEE Computer Society. https://doi.org/10.1109/ICDE.2014.6816676

Pagrol : Parallel graph olap over large-scale attributed graphs. / Wang, Zhengkui; Fan, Qi; Wang, Huiju; Tan, Kian Lee; Agrawal, Divyakant; El Abbadi, Amr.

Proceedings - International Conference on Data Engineering. IEEE Computer Society, 2014. p. 496-507 6816676.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, Z, Fan, Q, Wang, H, Tan, KL, Agrawal, D & El Abbadi, A 2014, Pagrol: Parallel graph olap over large-scale attributed graphs. in Proceedings - International Conference on Data Engineering., 6816676, IEEE Computer Society, pp. 496-507, 30th IEEE International Conference on Data Engineering, ICDE 2014, Chicago, IL, United States, 31/3/14. https://doi.org/10.1109/ICDE.2014.6816676
Wang Z, Fan Q, Wang H, Tan KL, Agrawal D, El Abbadi A. Pagrol: Parallel graph olap over large-scale attributed graphs. In Proceedings - International Conference on Data Engineering. IEEE Computer Society. 2014. p. 496-507. 6816676 https://doi.org/10.1109/ICDE.2014.6816676
Wang, Zhengkui ; Fan, Qi ; Wang, Huiju ; Tan, Kian Lee ; Agrawal, Divyakant ; El Abbadi, Amr. / Pagrol : Parallel graph olap over large-scale attributed graphs. Proceedings - International Conference on Data Engineering. IEEE Computer Society, 2014. pp. 496-507
@inproceedings{df1d7c9ed39a45efa6c5ee9b69297dd4,
title = "Pagrol: Parallel graph olap over large-scale attributed graphs",
abstract = "Attributed graphs are becoming important tools for modeling information networks, such as the Web and various social networks (e.g. Facebook, LinkedIn, Twitter). However, it is computationally challenging to manage and analyze attributed graphs to support effective decision making. In this paper, we propose, Pagrol, a parallel graph OLAP (Online Analytical Processing) system over attributed graphs. In particular, Pagrol introduces a new conceptual Hyper Graph Cube model (which is an attributed-graph analogue of the data cube model for relational DBMS) to aggregate attributed graphs at different granularities and levels. The proposed model supports different queries as well as a new set of graph OLAP Roll-Up/Drill-Down operations. Furthermore, on the basis of Hyper Graph Cube, Pagrol provides an efficient MapReduce-based parallel graph cubing algorithm, MRGraph-Cubing, to compute the graph cube for an attributed graph. Pagrol employs numerous optimization techniques: (a) a self-contained join strategy to minimize I/O cost; (b) a scheme that groups cuboids into batches so as to minimize redundant computations; (c) a cost-based scheme to allocate the batches into bags (each with a small number of batches); and (d) an efficient scheme to process a bag using a single MapReduce job. Results of extensive experimental studies using both real Facebook and synthetic datasets on a 128-node cluster show that Pagrol is effective, efficient and scalable.",
author = "Zhengkui Wang and Qi Fan and Huiju Wang and Tan, {Kian Lee} and Divyakant Agrawal and {El Abbadi}, Amr",
year = "2014",
month = "1",
day = "1",
doi = "10.1109/ICDE.2014.6816676",
language = "English",
isbn = "9781479925544",
pages = "496--507",
booktitle = "Proceedings - International Conference on Data Engineering",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Pagrol

T2 - Parallel graph olap over large-scale attributed graphs

AU - Wang, Zhengkui

AU - Fan, Qi

AU - Wang, Huiju

AU - Tan, Kian Lee

AU - Agrawal, Divyakant

AU - El Abbadi, Amr

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Attributed graphs are becoming important tools for modeling information networks, such as the Web and various social networks (e.g. Facebook, LinkedIn, Twitter). However, it is computationally challenging to manage and analyze attributed graphs to support effective decision making. In this paper, we propose, Pagrol, a parallel graph OLAP (Online Analytical Processing) system over attributed graphs. In particular, Pagrol introduces a new conceptual Hyper Graph Cube model (which is an attributed-graph analogue of the data cube model for relational DBMS) to aggregate attributed graphs at different granularities and levels. The proposed model supports different queries as well as a new set of graph OLAP Roll-Up/Drill-Down operations. Furthermore, on the basis of Hyper Graph Cube, Pagrol provides an efficient MapReduce-based parallel graph cubing algorithm, MRGraph-Cubing, to compute the graph cube for an attributed graph. Pagrol employs numerous optimization techniques: (a) a self-contained join strategy to minimize I/O cost; (b) a scheme that groups cuboids into batches so as to minimize redundant computations; (c) a cost-based scheme to allocate the batches into bags (each with a small number of batches); and (d) an efficient scheme to process a bag using a single MapReduce job. Results of extensive experimental studies using both real Facebook and synthetic datasets on a 128-node cluster show that Pagrol is effective, efficient and scalable.

AB - Attributed graphs are becoming important tools for modeling information networks, such as the Web and various social networks (e.g. Facebook, LinkedIn, Twitter). However, it is computationally challenging to manage and analyze attributed graphs to support effective decision making. In this paper, we propose, Pagrol, a parallel graph OLAP (Online Analytical Processing) system over attributed graphs. In particular, Pagrol introduces a new conceptual Hyper Graph Cube model (which is an attributed-graph analogue of the data cube model for relational DBMS) to aggregate attributed graphs at different granularities and levels. The proposed model supports different queries as well as a new set of graph OLAP Roll-Up/Drill-Down operations. Furthermore, on the basis of Hyper Graph Cube, Pagrol provides an efficient MapReduce-based parallel graph cubing algorithm, MRGraph-Cubing, to compute the graph cube for an attributed graph. Pagrol employs numerous optimization techniques: (a) a self-contained join strategy to minimize I/O cost; (b) a scheme that groups cuboids into batches so as to minimize redundant computations; (c) a cost-based scheme to allocate the batches into bags (each with a small number of batches); and (d) an efficient scheme to process a bag using a single MapReduce job. Results of extensive experimental studies using both real Facebook and synthetic datasets on a 128-node cluster show that Pagrol is effective, efficient and scalable.

UR - http://www.scopus.com/inward/record.url?scp=84901793257&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84901793257&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2014.6816676

DO - 10.1109/ICDE.2014.6816676

M3 - Conference contribution

AN - SCOPUS:84901793257

SN - 9781479925544

SP - 496

EP - 507

BT - Proceedings - International Conference on Data Engineering

PB - IEEE Computer Society

ER -