KSC-net community detection for big data networks

RaghvenPhDa Mall, Johan A.K. Suykens

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

In this chapter, we demonstrate the applicability of the kernel spectral clustering (KSC) method for community detection in Big Data networks. We give a practical exposition of the KSC method [1] on large-scale synthetic and real-world networks with up to 106 nodes and 107 edges. The KSC method uses a primal-dual framework to construct a model on a smaller subset of the Big Data network. The original large-scale kernel matrix cannot fit in memory. So we select smaller subgraphs using a fast and unique representative subset (FURS) selection technique as proposed in Reference 2. These subsets are used for training and validation, respectively, to build the model and obtain the model parameters. It results in a powerful out-of-sample extensions property, which allows inferring of the community affiliation for unseen nodes. The KSC model requires a kernel function, which can have kernel parameters and what is needed to identify the number of clusters k in the network. A memory-efficient and computationally efficient model selection technique named balanced angular fit­ting (BAF) based on angular similarity in the eigenspace is proposed in Reference 1. Another parameter-free KSC model is proposed in Reference 3. In Reference 3, the model selection technique exploits the structure of projections in eigenspace to auto­matically identify the number of clusters and suggests that a normalized linear kernel is sufficient for networks with millions of nodes. This model selection technique uses the concept of entropy and balanced clusters for identifying the number of clusters k. We then describe our software called KSC-net, which obtains the representative subset by FURS, builds the KSC model, performs one of the two (BAF and parameter-free) model selection techniques, and uses out-of-sample extensions for community affiliation for the Big Data network.

Original languageEnglish
Title of host publicationBig Data
Subtitle of host publicationAlgorithms, Analytics, and Applications
PublisherCRC Press
Pages157-173
Number of pages17
ISBN (Electronic)9781482240566
ISBN (Print)9781482240559
Publication statusPublished - 1 Jan 2015

    Fingerprint

ASJC Scopus subject areas

  • Computer Science(all)
  • Mathematics(all)

Cite this

Mall, R., & Suykens, J. A. K. (2015). KSC-net community detection for big data networks. In Big Data: Algorithms, Analytics, and Applications (pp. 157-173). CRC Press.