HubPPR: Effective indexing for approximate personalized pagerank

Sibo Wang, Youze Tang, Xiaokui Xiao, Yin Yang, Zengxiang Li

Research output: Contribution to journalConference article

15 Citations (Scopus)

Abstract

Personalized PageRank (PPR) computation is a fundamental operation in web search, social networks, and graph analysis. Given a graph G, a source s, and a target t, the PPR query π(s, t) returns the probability that a random walk on G starting from s terminates at t. Unlike global PageRank which can be effectively pre-computed and materialized, the PPR result depends on both the source and the target, rendering results materialization infeasible for large graphs. Existing indexing techniques have rather limited effectiveness; in fact, the current state-of-the-art solution, BiPPR, answers individual PPR queries without pre-computation or indexing, and yet it outperforms all previous index-based solutions. Motivated by this, we propose HubPPR, an effective indexing scheme for PPR computation with controllable tradeoffs for accuracy, query time, and memory consumption. The main idea is to pre-compute and index auxiliary information for selected hub nodes that are often involved in PPR processing. Going one step further, we extend HubPPR to answer top-k PPR queries, which returns the k nodes with the highest PPR values with respect to a source s, among a given set T of target nodes. Extensive experiments demonstrate that compared to the current best solution BiPPR, HubPPR achieves up to 10x and 220x speedup for PPR and top-k PPR processing, respectively, with moderate memory consumption. Notably, with a single commodity server, HubPPR answers a top-k PPR query in seconds on graphs with billions of edges, with high accuracy and strong result quality guarantees.

Original languageEnglish
Pages (from-to)205-216
Number of pages12
JournalProceedings of the VLDB Endowment
Volume10
Issue number3
DOIs
Publication statusPublished - 1 Jan 2016

Fingerprint

Data storage equipment
Processing
Servers
Experiments

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

HubPPR : Effective indexing for approximate personalized pagerank. / Wang, Sibo; Tang, Youze; Xiao, Xiaokui; Yang, Yin; Li, Zengxiang.

In: Proceedings of the VLDB Endowment, Vol. 10, No. 3, 01.01.2016, p. 205-216.

Research output: Contribution to journalConference article

Wang, Sibo ; Tang, Youze ; Xiao, Xiaokui ; Yang, Yin ; Li, Zengxiang. / HubPPR : Effective indexing for approximate personalized pagerank. In: Proceedings of the VLDB Endowment. 2016 ; Vol. 10, No. 3. pp. 205-216.
@article{edcee470797b482fbe13d00eeb2722f9,
title = "HubPPR: Effective indexing for approximate personalized pagerank",
abstract = "Personalized PageRank (PPR) computation is a fundamental operation in web search, social networks, and graph analysis. Given a graph G, a source s, and a target t, the PPR query π(s, t) returns the probability that a random walk on G starting from s terminates at t. Unlike global PageRank which can be effectively pre-computed and materialized, the PPR result depends on both the source and the target, rendering results materialization infeasible for large graphs. Existing indexing techniques have rather limited effectiveness; in fact, the current state-of-the-art solution, BiPPR, answers individual PPR queries without pre-computation or indexing, and yet it outperforms all previous index-based solutions. Motivated by this, we propose HubPPR, an effective indexing scheme for PPR computation with controllable tradeoffs for accuracy, query time, and memory consumption. The main idea is to pre-compute and index auxiliary information for selected hub nodes that are often involved in PPR processing. Going one step further, we extend HubPPR to answer top-k PPR queries, which returns the k nodes with the highest PPR values with respect to a source s, among a given set T of target nodes. Extensive experiments demonstrate that compared to the current best solution BiPPR, HubPPR achieves up to 10x and 220x speedup for PPR and top-k PPR processing, respectively, with moderate memory consumption. Notably, with a single commodity server, HubPPR answers a top-k PPR query in seconds on graphs with billions of edges, with high accuracy and strong result quality guarantees.",
author = "Sibo Wang and Youze Tang and Xiaokui Xiao and Yin Yang and Zengxiang Li",
year = "2016",
month = "1",
day = "1",
doi = "10.14778/3021924.3021936",
language = "English",
volume = "10",
pages = "205--216",
journal = "Proceedings of the VLDB Endowment",
issn = "2150-8097",
publisher = "Very Large Data Base Endowment Inc.",
number = "3",

}

TY - JOUR

T1 - HubPPR

T2 - Effective indexing for approximate personalized pagerank

AU - Wang, Sibo

AU - Tang, Youze

AU - Xiao, Xiaokui

AU - Yang, Yin

AU - Li, Zengxiang

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Personalized PageRank (PPR) computation is a fundamental operation in web search, social networks, and graph analysis. Given a graph G, a source s, and a target t, the PPR query π(s, t) returns the probability that a random walk on G starting from s terminates at t. Unlike global PageRank which can be effectively pre-computed and materialized, the PPR result depends on both the source and the target, rendering results materialization infeasible for large graphs. Existing indexing techniques have rather limited effectiveness; in fact, the current state-of-the-art solution, BiPPR, answers individual PPR queries without pre-computation or indexing, and yet it outperforms all previous index-based solutions. Motivated by this, we propose HubPPR, an effective indexing scheme for PPR computation with controllable tradeoffs for accuracy, query time, and memory consumption. The main idea is to pre-compute and index auxiliary information for selected hub nodes that are often involved in PPR processing. Going one step further, we extend HubPPR to answer top-k PPR queries, which returns the k nodes with the highest PPR values with respect to a source s, among a given set T of target nodes. Extensive experiments demonstrate that compared to the current best solution BiPPR, HubPPR achieves up to 10x and 220x speedup for PPR and top-k PPR processing, respectively, with moderate memory consumption. Notably, with a single commodity server, HubPPR answers a top-k PPR query in seconds on graphs with billions of edges, with high accuracy and strong result quality guarantees.

AB - Personalized PageRank (PPR) computation is a fundamental operation in web search, social networks, and graph analysis. Given a graph G, a source s, and a target t, the PPR query π(s, t) returns the probability that a random walk on G starting from s terminates at t. Unlike global PageRank which can be effectively pre-computed and materialized, the PPR result depends on both the source and the target, rendering results materialization infeasible for large graphs. Existing indexing techniques have rather limited effectiveness; in fact, the current state-of-the-art solution, BiPPR, answers individual PPR queries without pre-computation or indexing, and yet it outperforms all previous index-based solutions. Motivated by this, we propose HubPPR, an effective indexing scheme for PPR computation with controllable tradeoffs for accuracy, query time, and memory consumption. The main idea is to pre-compute and index auxiliary information for selected hub nodes that are often involved in PPR processing. Going one step further, we extend HubPPR to answer top-k PPR queries, which returns the k nodes with the highest PPR values with respect to a source s, among a given set T of target nodes. Extensive experiments demonstrate that compared to the current best solution BiPPR, HubPPR achieves up to 10x and 220x speedup for PPR and top-k PPR processing, respectively, with moderate memory consumption. Notably, with a single commodity server, HubPPR answers a top-k PPR query in seconds on graphs with billions of edges, with high accuracy and strong result quality guarantees.

UR - http://www.scopus.com/inward/record.url?scp=85020475427&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85020475427&partnerID=8YFLogxK

U2 - 10.14778/3021924.3021936

DO - 10.14778/3021924.3021936

M3 - Conference article

AN - SCOPUS:85020475427

VL - 10

SP - 205

EP - 216

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

SN - 2150-8097

IS - 3

ER -