Efficient algorithms for large-scale local triangle counting

Luca Becchetti, Paolo Boldi, Carlos Castillo, Aristides Gionis

Research output: Contribution to journalArticle

38 Citations (Scopus)

Abstract

In this article, we study the problem of approximate local triangle counting in large graphs. Namely, given a large graph G = (V, E) we want to estimate as accurately as possible the number of triangles incident to every node ν ∈ V in the graph. We consider the question both for undirected and directed graphs. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first contribution that addresses the problem of approximate local triangle counting with a focus on the efficiency issues arising in massive graphs and that also considers the directed case. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help detect the presence of spamming activity in largescale Web graphs, as well as to provide useful features for content quality assessment in social networks. For computing the local number of triangles (undirected and directed), we propose two approximation algorithms, which are based on the idea of min-wise independent permutations [Broder et al. 1998]. Our algorithms operate in a semi-streaming fashion, using O(|V|) space in main memory and performing O(log|V|) sequential scans over the edges of the graph. The first algorithm we describe in this article also uses O(|E|) space of external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results on large graphs, demonstrating the practical efficiency of our approach.

Original languageEnglish
Article number13
JournalACM Transactions on Knowledge Discovery from Data
Volume4
Issue number3
DOIs
Publication statusPublished - 1 Oct 2010
Externally publishedYes

Fingerprint

Data storage equipment
Spamming
Directed graphs
Approximation algorithms

Keywords

  • Clustering coefficient
  • Massive-graph computing
  • Social networks
  • Web com- puting

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Efficient algorithms for large-scale local triangle counting. / Becchetti, Luca; Boldi, Paolo; Castillo, Carlos; Gionis, Aristides.

In: ACM Transactions on Knowledge Discovery from Data, Vol. 4, No. 3, 13, 01.10.2010.

Research output: Contribution to journalArticle

Becchetti, Luca ; Boldi, Paolo ; Castillo, Carlos ; Gionis, Aristides. / Efficient algorithms for large-scale local triangle counting. In: ACM Transactions on Knowledge Discovery from Data. 2010 ; Vol. 4, No. 3.
@article{e2651ae2718e4a5b966819ff5d72dd32,
title = "Efficient algorithms for large-scale local triangle counting",
abstract = "In this article, we study the problem of approximate local triangle counting in large graphs. Namely, given a large graph G = (V, E) we want to estimate as accurately as possible the number of triangles incident to every node ν ∈ V in the graph. We consider the question both for undirected and directed graphs. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first contribution that addresses the problem of approximate local triangle counting with a focus on the efficiency issues arising in massive graphs and that also considers the directed case. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help detect the presence of spamming activity in largescale Web graphs, as well as to provide useful features for content quality assessment in social networks. For computing the local number of triangles (undirected and directed), we propose two approximation algorithms, which are based on the idea of min-wise independent permutations [Broder et al. 1998]. Our algorithms operate in a semi-streaming fashion, using O(|V|) space in main memory and performing O(log|V|) sequential scans over the edges of the graph. The first algorithm we describe in this article also uses O(|E|) space of external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results on large graphs, demonstrating the practical efficiency of our approach.",
keywords = "Clustering coefficient, Massive-graph computing, Social networks, Web com- puting",
author = "Luca Becchetti and Paolo Boldi and Carlos Castillo and Aristides Gionis",
year = "2010",
month = "10",
day = "1",
doi = "10.1145/1839490.1839494",
language = "English",
volume = "4",
journal = "ACM Transactions on Knowledge Discovery from Data",
issn = "1556-4681",
publisher = "Association for Computing Machinery (ACM)",
number = "3",

}

TY - JOUR

T1 - Efficient algorithms for large-scale local triangle counting

AU - Becchetti, Luca

AU - Boldi, Paolo

AU - Castillo, Carlos

AU - Gionis, Aristides

PY - 2010/10/1

Y1 - 2010/10/1

N2 - In this article, we study the problem of approximate local triangle counting in large graphs. Namely, given a large graph G = (V, E) we want to estimate as accurately as possible the number of triangles incident to every node ν ∈ V in the graph. We consider the question both for undirected and directed graphs. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first contribution that addresses the problem of approximate local triangle counting with a focus on the efficiency issues arising in massive graphs and that also considers the directed case. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help detect the presence of spamming activity in largescale Web graphs, as well as to provide useful features for content quality assessment in social networks. For computing the local number of triangles (undirected and directed), we propose two approximation algorithms, which are based on the idea of min-wise independent permutations [Broder et al. 1998]. Our algorithms operate in a semi-streaming fashion, using O(|V|) space in main memory and performing O(log|V|) sequential scans over the edges of the graph. The first algorithm we describe in this article also uses O(|E|) space of external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results on large graphs, demonstrating the practical efficiency of our approach.

AB - In this article, we study the problem of approximate local triangle counting in large graphs. Namely, given a large graph G = (V, E) we want to estimate as accurately as possible the number of triangles incident to every node ν ∈ V in the graph. We consider the question both for undirected and directed graphs. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first contribution that addresses the problem of approximate local triangle counting with a focus on the efficiency issues arising in massive graphs and that also considers the directed case. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help detect the presence of spamming activity in largescale Web graphs, as well as to provide useful features for content quality assessment in social networks. For computing the local number of triangles (undirected and directed), we propose two approximation algorithms, which are based on the idea of min-wise independent permutations [Broder et al. 1998]. Our algorithms operate in a semi-streaming fashion, using O(|V|) space in main memory and performing O(log|V|) sequential scans over the edges of the graph. The first algorithm we describe in this article also uses O(|E|) space of external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results on large graphs, demonstrating the practical efficiency of our approach.

KW - Clustering coefficient

KW - Massive-graph computing

KW - Social networks

KW - Web com- puting

UR - http://www.scopus.com/inward/record.url?scp=78049335266&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78049335266&partnerID=8YFLogxK

U2 - 10.1145/1839490.1839494

DO - 10.1145/1839490.1839494

M3 - Article

AN - SCOPUS:78049335266

VL - 4

JO - ACM Transactions on Knowledge Discovery from Data

JF - ACM Transactions on Knowledge Discovery from Data

SN - 1556-4681

IS - 3

M1 - 13

ER -