KSC-net community detection for big data networks

RaghvenPhDa Mall, Johan A.K. Suykens

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

In this chapter, we demonstrate the applicability of the kernel spectral clustering (KSC) method for community detection in Big Data networks. We give a practical exposition of the KSC method [1] on large-scale synthetic and real-world networks with up to 106 nodes and 107 edges. The KSC method uses a primal-dual framework to construct a model on a smaller subset of the Big Data network. The original large-scale kernel matrix cannot fit in memory. So we select smaller subgraphs using a fast and unique representative subset (FURS) selection technique as proposed in Reference 2. These subsets are used for training and validation, respectively, to build the model and obtain the model parameters. It results in a powerful out-of-sample extensions property, which allows inferring of the community affiliation for unseen nodes. The KSC model requires a kernel function, which can have kernel parameters and what is needed to identify the number of clusters k in the network. A memory-efficient and computationally efficient model selection technique named balanced angular fit­ting (BAF) based on angular similarity in the eigenspace is proposed in Reference 1. Another parameter-free KSC model is proposed in Reference 3. In Reference 3, the model selection technique exploits the structure of projections in eigenspace to auto­matically identify the number of clusters and suggests that a normalized linear kernel is sufficient for networks with millions of nodes. This model selection technique uses the concept of entropy and balanced clusters for identifying the number of clusters k. We then describe our software called KSC-net, which obtains the representative subset by FURS, builds the KSC model, performs one of the two (BAF and parameter-free) model selection techniques, and uses out-of-sample extensions for community affiliation for the Big Data network.

Original languageEnglish
Title of host publicationBig Data
Subtitle of host publicationAlgorithms, Analytics, and Applications
PublisherCRC Press
Pages157-173
Number of pages17
ISBN (Electronic)9781482240566
ISBN (Print)9781482240559
Publication statusPublished - 1 Jan 2015

Fingerprint

Community Detection
Spectral Clustering
kernel
Model Selection
Number of Clusters
Spectral Methods
Clustering Methods
Subset
Eigenspace
Vertex of a graph
Big data
Model
Data storage equipment
Subset Selection
Primal-dual
Kernel Function
Subgraph
Entropy
Projection
Sufficient

ASJC Scopus subject areas

  • Computer Science(all)
  • Mathematics(all)

Cite this

Mall, R., & Suykens, J. A. K. (2015). KSC-net community detection for big data networks. In Big Data: Algorithms, Analytics, and Applications (pp. 157-173). CRC Press.

KSC-net community detection for big data networks. / Mall, RaghvenPhDa; Suykens, Johan A.K.

Big Data: Algorithms, Analytics, and Applications. CRC Press, 2015. p. 157-173.

Research output: Chapter in Book/Report/Conference proceedingChapter

Mall, R & Suykens, JAK 2015, KSC-net community detection for big data networks. in Big Data: Algorithms, Analytics, and Applications. CRC Press, pp. 157-173.
Mall R, Suykens JAK. KSC-net community detection for big data networks. In Big Data: Algorithms, Analytics, and Applications. CRC Press. 2015. p. 157-173
Mall, RaghvenPhDa ; Suykens, Johan A.K. / KSC-net community detection for big data networks. Big Data: Algorithms, Analytics, and Applications. CRC Press, 2015. pp. 157-173
@inbook{39368abb65ee4580bc528e7a81c89981,
title = "KSC-net community detection for big data networks",
abstract = "In this chapter, we demonstrate the applicability of the kernel spectral clustering (KSC) method for community detection in Big Data networks. We give a practical exposition of the KSC method [1] on large-scale synthetic and real-world networks with up to 106 nodes and 107 edges. The KSC method uses a primal-dual framework to construct a model on a smaller subset of the Big Data network. The original large-scale kernel matrix cannot fit in memory. So we select smaller subgraphs using a fast and unique representative subset (FURS) selection technique as proposed in Reference 2. These subsets are used for training and validation, respectively, to build the model and obtain the model parameters. It results in a powerful out-of-sample extensions property, which allows inferring of the community affiliation for unseen nodes. The KSC model requires a kernel function, which can have kernel parameters and what is needed to identify the number of clusters k in the network. A memory-efficient and computationally efficient model selection technique named balanced angular fit­ting (BAF) based on angular similarity in the eigenspace is proposed in Reference 1. Another parameter-free KSC model is proposed in Reference 3. In Reference 3, the model selection technique exploits the structure of projections in eigenspace to auto­matically identify the number of clusters and suggests that a normalized linear kernel is sufficient for networks with millions of nodes. This model selection technique uses the concept of entropy and balanced clusters for identifying the number of clusters k. We then describe our software called KSC-net, which obtains the representative subset by FURS, builds the KSC model, performs one of the two (BAF and parameter-free) model selection techniques, and uses out-of-sample extensions for community affiliation for the Big Data network.",
author = "RaghvenPhDa Mall and Suykens, {Johan A.K.}",
year = "2015",
month = "1",
day = "1",
language = "English",
isbn = "9781482240559",
pages = "157--173",
booktitle = "Big Data",
publisher = "CRC Press",

}

TY - CHAP

T1 - KSC-net community detection for big data networks

AU - Mall, RaghvenPhDa

AU - Suykens, Johan A.K.

PY - 2015/1/1

Y1 - 2015/1/1

N2 - In this chapter, we demonstrate the applicability of the kernel spectral clustering (KSC) method for community detection in Big Data networks. We give a practical exposition of the KSC method [1] on large-scale synthetic and real-world networks with up to 106 nodes and 107 edges. The KSC method uses a primal-dual framework to construct a model on a smaller subset of the Big Data network. The original large-scale kernel matrix cannot fit in memory. So we select smaller subgraphs using a fast and unique representative subset (FURS) selection technique as proposed in Reference 2. These subsets are used for training and validation, respectively, to build the model and obtain the model parameters. It results in a powerful out-of-sample extensions property, which allows inferring of the community affiliation for unseen nodes. The KSC model requires a kernel function, which can have kernel parameters and what is needed to identify the number of clusters k in the network. A memory-efficient and computationally efficient model selection technique named balanced angular fit­ting (BAF) based on angular similarity in the eigenspace is proposed in Reference 1. Another parameter-free KSC model is proposed in Reference 3. In Reference 3, the model selection technique exploits the structure of projections in eigenspace to auto­matically identify the number of clusters and suggests that a normalized linear kernel is sufficient for networks with millions of nodes. This model selection technique uses the concept of entropy and balanced clusters for identifying the number of clusters k. We then describe our software called KSC-net, which obtains the representative subset by FURS, builds the KSC model, performs one of the two (BAF and parameter-free) model selection techniques, and uses out-of-sample extensions for community affiliation for the Big Data network.

AB - In this chapter, we demonstrate the applicability of the kernel spectral clustering (KSC) method for community detection in Big Data networks. We give a practical exposition of the KSC method [1] on large-scale synthetic and real-world networks with up to 106 nodes and 107 edges. The KSC method uses a primal-dual framework to construct a model on a smaller subset of the Big Data network. The original large-scale kernel matrix cannot fit in memory. So we select smaller subgraphs using a fast and unique representative subset (FURS) selection technique as proposed in Reference 2. These subsets are used for training and validation, respectively, to build the model and obtain the model parameters. It results in a powerful out-of-sample extensions property, which allows inferring of the community affiliation for unseen nodes. The KSC model requires a kernel function, which can have kernel parameters and what is needed to identify the number of clusters k in the network. A memory-efficient and computationally efficient model selection technique named balanced angular fit­ting (BAF) based on angular similarity in the eigenspace is proposed in Reference 1. Another parameter-free KSC model is proposed in Reference 3. In Reference 3, the model selection technique exploits the structure of projections in eigenspace to auto­matically identify the number of clusters and suggests that a normalized linear kernel is sufficient for networks with millions of nodes. This model selection technique uses the concept of entropy and balanced clusters for identifying the number of clusters k. We then describe our software called KSC-net, which obtains the representative subset by FURS, builds the KSC model, performs one of the two (BAF and parameter-free) model selection techniques, and uses out-of-sample extensions for community affiliation for the Big Data network.

UR - http://www.scopus.com/inward/record.url?scp=85053962529&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85053962529&partnerID=8YFLogxK

M3 - Chapter

AN - SCOPUS:85053962529

SN - 9781482240559

SP - 157

EP - 173

BT - Big Data

PB - CRC Press

ER -