Mining communities in networks

A solution for consistency and its evaluation

Haewoon Kwak, Yoonchan Choi, Young Ho Eom, Hawoong Jeong, Sue Moon

Research output: Chapter in Book/Report/Conference proceedingConference contribution

43 Citations (Scopus)

Abstract

Online social networks pose significant challenges to computer scientists, physicists, and sociologists alike, for their massive size, fast evolution, and uncharted potential for social computing. One particular problem that has interested us is community identification. Many algorithms based on various metrics have been proposed for identifying communities in networks [18, 24], but a few algorithms scale to very large networks. Three recent community identification algorithms, namely CNM [16],Wakita [59], and Louvain [10], stand out for their scalability to a few millions of nodes. All of them use modularity as the metric of optimization. However, all three algorithms produce inconsistent communities every time the input ordering of nodes to the algorithms changes. We propose two quantitative metrics to represent the level of consistency across multiple runs of an algorithm: pairwise membership probability and consistency. Based on these two metrics, we propose a solution that improves the consistency without compromising the modularity. We demonstrate that our solution to use pairwise membership probabilities as link weights generates consistent communities within six or fewer cycles for most networks. However, our iterative, pairwise membership reinforcing approach does not deliver convergence for Flickr, Orkut, and Cyworld networks as well for the rest of the networks. Our approach is empirically driven and is yet to be shown to produce consistent output analytically. We leave further investigation into the topological structure and its impact on the consistency as future work. In order to evaluate the quality of clustering, we have looked at 3 of the 48 communities identified in the AS graph. Surprisingly, they all have either hierarchical, geographical, or topological interpretations to their groupings. Our preliminary evaluation of the quality of communities is promising. We plan to conduct more thorough evaluation of the communities and study network structures and their evolutions using our approach.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGCOMM Internet Measurement Conference, IMC
Pages301-314
Number of pages14
DOIs
Publication statusPublished - 1 Dec 2009
Externally publishedYes
Event2009 9th ACM SIGCOMM Internet Measurement Conference, IMC 2009 - Chicago, IL, United States
Duration: 4 Nov 20096 Nov 2009

Other

Other2009 9th ACM SIGCOMM Internet Measurement Conference, IMC 2009
CountryUnited States
CityChicago, IL
Period4/11/096/11/09

Fingerprint

Scalability

Keywords

  • AS graph
  • CNM
  • Community
  • Consistent community identification
  • Louvain
  • Modularity
  • Social networks
  • Wakita

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications

Cite this

Kwak, H., Choi, Y., Eom, Y. H., Jeong, H., & Moon, S. (2009). Mining communities in networks: A solution for consistency and its evaluation. In Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC (pp. 301-314) https://doi.org/10.1145/1644893.1644930

Mining communities in networks : A solution for consistency and its evaluation. / Kwak, Haewoon; Choi, Yoonchan; Eom, Young Ho; Jeong, Hawoong; Moon, Sue.

Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC. 2009. p. 301-314.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kwak, H, Choi, Y, Eom, YH, Jeong, H & Moon, S 2009, Mining communities in networks: A solution for consistency and its evaluation. in Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC. pp. 301-314, 2009 9th ACM SIGCOMM Internet Measurement Conference, IMC 2009, Chicago, IL, United States, 4/11/09. https://doi.org/10.1145/1644893.1644930
Kwak H, Choi Y, Eom YH, Jeong H, Moon S. Mining communities in networks: A solution for consistency and its evaluation. In Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC. 2009. p. 301-314 https://doi.org/10.1145/1644893.1644930
Kwak, Haewoon ; Choi, Yoonchan ; Eom, Young Ho ; Jeong, Hawoong ; Moon, Sue. / Mining communities in networks : A solution for consistency and its evaluation. Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC. 2009. pp. 301-314
@inproceedings{fd6c92fdc69d4caa9d2a47ee83b26671,
title = "Mining communities in networks: A solution for consistency and its evaluation",
abstract = "Online social networks pose significant challenges to computer scientists, physicists, and sociologists alike, for their massive size, fast evolution, and uncharted potential for social computing. One particular problem that has interested us is community identification. Many algorithms based on various metrics have been proposed for identifying communities in networks [18, 24], but a few algorithms scale to very large networks. Three recent community identification algorithms, namely CNM [16],Wakita [59], and Louvain [10], stand out for their scalability to a few millions of nodes. All of them use modularity as the metric of optimization. However, all three algorithms produce inconsistent communities every time the input ordering of nodes to the algorithms changes. We propose two quantitative metrics to represent the level of consistency across multiple runs of an algorithm: pairwise membership probability and consistency. Based on these two metrics, we propose a solution that improves the consistency without compromising the modularity. We demonstrate that our solution to use pairwise membership probabilities as link weights generates consistent communities within six or fewer cycles for most networks. However, our iterative, pairwise membership reinforcing approach does not deliver convergence for Flickr, Orkut, and Cyworld networks as well for the rest of the networks. Our approach is empirically driven and is yet to be shown to produce consistent output analytically. We leave further investigation into the topological structure and its impact on the consistency as future work. In order to evaluate the quality of clustering, we have looked at 3 of the 48 communities identified in the AS graph. Surprisingly, they all have either hierarchical, geographical, or topological interpretations to their groupings. Our preliminary evaluation of the quality of communities is promising. We plan to conduct more thorough evaluation of the communities and study network structures and their evolutions using our approach.",
keywords = "AS graph, CNM, Community, Consistent community identification, Louvain, Modularity, Social networks, Wakita",
author = "Haewoon Kwak and Yoonchan Choi and Eom, {Young Ho} and Hawoong Jeong and Sue Moon",
year = "2009",
month = "12",
day = "1",
doi = "10.1145/1644893.1644930",
language = "English",
isbn = "9781605587707",
pages = "301--314",
booktitle = "Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC",

}

TY - GEN

T1 - Mining communities in networks

T2 - A solution for consistency and its evaluation

AU - Kwak, Haewoon

AU - Choi, Yoonchan

AU - Eom, Young Ho

AU - Jeong, Hawoong

AU - Moon, Sue

PY - 2009/12/1

Y1 - 2009/12/1

N2 - Online social networks pose significant challenges to computer scientists, physicists, and sociologists alike, for their massive size, fast evolution, and uncharted potential for social computing. One particular problem that has interested us is community identification. Many algorithms based on various metrics have been proposed for identifying communities in networks [18, 24], but a few algorithms scale to very large networks. Three recent community identification algorithms, namely CNM [16],Wakita [59], and Louvain [10], stand out for their scalability to a few millions of nodes. All of them use modularity as the metric of optimization. However, all three algorithms produce inconsistent communities every time the input ordering of nodes to the algorithms changes. We propose two quantitative metrics to represent the level of consistency across multiple runs of an algorithm: pairwise membership probability and consistency. Based on these two metrics, we propose a solution that improves the consistency without compromising the modularity. We demonstrate that our solution to use pairwise membership probabilities as link weights generates consistent communities within six or fewer cycles for most networks. However, our iterative, pairwise membership reinforcing approach does not deliver convergence for Flickr, Orkut, and Cyworld networks as well for the rest of the networks. Our approach is empirically driven and is yet to be shown to produce consistent output analytically. We leave further investigation into the topological structure and its impact on the consistency as future work. In order to evaluate the quality of clustering, we have looked at 3 of the 48 communities identified in the AS graph. Surprisingly, they all have either hierarchical, geographical, or topological interpretations to their groupings. Our preliminary evaluation of the quality of communities is promising. We plan to conduct more thorough evaluation of the communities and study network structures and their evolutions using our approach.

AB - Online social networks pose significant challenges to computer scientists, physicists, and sociologists alike, for their massive size, fast evolution, and uncharted potential for social computing. One particular problem that has interested us is community identification. Many algorithms based on various metrics have been proposed for identifying communities in networks [18, 24], but a few algorithms scale to very large networks. Three recent community identification algorithms, namely CNM [16],Wakita [59], and Louvain [10], stand out for their scalability to a few millions of nodes. All of them use modularity as the metric of optimization. However, all three algorithms produce inconsistent communities every time the input ordering of nodes to the algorithms changes. We propose two quantitative metrics to represent the level of consistency across multiple runs of an algorithm: pairwise membership probability and consistency. Based on these two metrics, we propose a solution that improves the consistency without compromising the modularity. We demonstrate that our solution to use pairwise membership probabilities as link weights generates consistent communities within six or fewer cycles for most networks. However, our iterative, pairwise membership reinforcing approach does not deliver convergence for Flickr, Orkut, and Cyworld networks as well for the rest of the networks. Our approach is empirically driven and is yet to be shown to produce consistent output analytically. We leave further investigation into the topological structure and its impact on the consistency as future work. In order to evaluate the quality of clustering, we have looked at 3 of the 48 communities identified in the AS graph. Surprisingly, they all have either hierarchical, geographical, or topological interpretations to their groupings. Our preliminary evaluation of the quality of communities is promising. We plan to conduct more thorough evaluation of the communities and study network structures and their evolutions using our approach.

KW - AS graph

KW - CNM

KW - Community

KW - Consistent community identification

KW - Louvain

KW - Modularity

KW - Social networks

KW - Wakita

UR - http://www.scopus.com/inward/record.url?scp=84870691488&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84870691488&partnerID=8YFLogxK

U2 - 10.1145/1644893.1644930

DO - 10.1145/1644893.1644930

M3 - Conference contribution

SN - 9781605587707

SP - 301

EP - 314

BT - Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC

ER -