Efficient semi-streaming algorithms for local triangle counting in massive graphs

Luca Becchetti, Paolo Boldi, Carlos Castillo, Aristides Gionis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

154 Citations (Scopus)

Abstract

In this paper we study the problem of local triangle counting in large graphs. Namely, given a large graph G = (V;E) we want to estimate as accurately as possible the number of triangles incident to every node v ε V in the graph. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first paper that addresses the problem of local triangle counting with a focus on the efficiency issues arising in massive graphs. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help to detect the presence of spamming activity in large-scale Web graphs, as well as to provide useful features to assess content quality in social networks. For computing the local number of triangles we propose two approximation algorithms, which are based on the idea of min-wise independent permutations (Broder et al. 1998). Our algorithms operate in a semi-streaming fashion, using O(|V|) space in main memory and performing O(log |V|) sequential scans over the edges of the graph. The first algorithm we describe in this paper also uses O(jEj) space in external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results in massive graphs demonstrating the practical efficiency of our approach.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages16-24
Number of pages9
DOIs
Publication statusPublished - 1 Dec 2008
Externally publishedYes
Event14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008 - Las Vegas, NV, United States
Duration: 24 Aug 200827 Aug 2008

Other

Other14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008
CountryUnited States
CityLas Vegas, NV
Period24/8/0827/8/08

Fingerprint

Data storage equipment
Spamming
Approximation algorithms

Keywords

  • Graph mining
  • Probabilistic algorithms
  • Semi-streaming

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Becchetti, L., Boldi, P., Castillo, C., & Gionis, A. (2008). Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 16-24) https://doi.org/10.1145/1401890.1401898

Efficient semi-streaming algorithms for local triangle counting in massive graphs. / Becchetti, Luca; Boldi, Paolo; Castillo, Carlos; Gionis, Aristides.

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008. p. 16-24.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Becchetti, L, Boldi, P, Castillo, C & Gionis, A 2008, Efficient semi-streaming algorithms for local triangle counting in massive graphs. in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 16-24, 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, Las Vegas, NV, United States, 24/8/08. https://doi.org/10.1145/1401890.1401898
Becchetti L, Boldi P, Castillo C, Gionis A. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008. p. 16-24 https://doi.org/10.1145/1401890.1401898
Becchetti, Luca ; Boldi, Paolo ; Castillo, Carlos ; Gionis, Aristides. / Efficient semi-streaming algorithms for local triangle counting in massive graphs. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008. pp. 16-24
@inproceedings{c1ce895b67a44f3084abf917bac9cd64,
title = "Efficient semi-streaming algorithms for local triangle counting in massive graphs",
abstract = "In this paper we study the problem of local triangle counting in large graphs. Namely, given a large graph G = (V;E) we want to estimate as accurately as possible the number of triangles incident to every node v ε V in the graph. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first paper that addresses the problem of local triangle counting with a focus on the efficiency issues arising in massive graphs. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help to detect the presence of spamming activity in large-scale Web graphs, as well as to provide useful features to assess content quality in social networks. For computing the local number of triangles we propose two approximation algorithms, which are based on the idea of min-wise independent permutations (Broder et al. 1998). Our algorithms operate in a semi-streaming fashion, using O(|V|) space in main memory and performing O(log |V|) sequential scans over the edges of the graph. The first algorithm we describe in this paper also uses O(jEj) space in external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results in massive graphs demonstrating the practical efficiency of our approach.",
keywords = "Graph mining, Probabilistic algorithms, Semi-streaming",
author = "Luca Becchetti and Paolo Boldi and Carlos Castillo and Aristides Gionis",
year = "2008",
month = "12",
day = "1",
doi = "10.1145/1401890.1401898",
language = "English",
isbn = "9781605581934",
pages = "16--24",
booktitle = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Efficient semi-streaming algorithms for local triangle counting in massive graphs

AU - Becchetti, Luca

AU - Boldi, Paolo

AU - Castillo, Carlos

AU - Gionis, Aristides

PY - 2008/12/1

Y1 - 2008/12/1

N2 - In this paper we study the problem of local triangle counting in large graphs. Namely, given a large graph G = (V;E) we want to estimate as accurately as possible the number of triangles incident to every node v ε V in the graph. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first paper that addresses the problem of local triangle counting with a focus on the efficiency issues arising in massive graphs. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help to detect the presence of spamming activity in large-scale Web graphs, as well as to provide useful features to assess content quality in social networks. For computing the local number of triangles we propose two approximation algorithms, which are based on the idea of min-wise independent permutations (Broder et al. 1998). Our algorithms operate in a semi-streaming fashion, using O(|V|) space in main memory and performing O(log |V|) sequential scans over the edges of the graph. The first algorithm we describe in this paper also uses O(jEj) space in external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results in massive graphs demonstrating the practical efficiency of our approach.

AB - In this paper we study the problem of local triangle counting in large graphs. Namely, given a large graph G = (V;E) we want to estimate as accurately as possible the number of triangles incident to every node v ε V in the graph. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first paper that addresses the problem of local triangle counting with a focus on the efficiency issues arising in massive graphs. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help to detect the presence of spamming activity in large-scale Web graphs, as well as to provide useful features to assess content quality in social networks. For computing the local number of triangles we propose two approximation algorithms, which are based on the idea of min-wise independent permutations (Broder et al. 1998). Our algorithms operate in a semi-streaming fashion, using O(|V|) space in main memory and performing O(log |V|) sequential scans over the edges of the graph. The first algorithm we describe in this paper also uses O(jEj) space in external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results in massive graphs demonstrating the practical efficiency of our approach.

KW - Graph mining

KW - Probabilistic algorithms

KW - Semi-streaming

UR - http://www.scopus.com/inward/record.url?scp=61649118133&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=61649118133&partnerID=8YFLogxK

U2 - 10.1145/1401890.1401898

DO - 10.1145/1401890.1401898

M3 - Conference contribution

SN - 9781605581934

SP - 16

EP - 24

BT - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ER -