Denoised Kernel Spectral data Clustering

RaghvenPhDa Mall, Halima Bensmail, Rocco Langone, Carolina Varon, Johan A.K. Suykens

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Kernel Spectral Clustering (KSC) solves a weighted kernel principal component analysis problem in a primal-dual optimization framework. It builds an unsupervised model on a small subset of data using the dual solution of the optimization problem. This allows KSC to have a powerful out-of-sample extension property leading to good cluster generalization w.r.t. unseen data points. However, in the presence of noise that causes overlapping data, the technique often fails to provide good generalization capability. In this paper, we propose a two-step process for clustering noisy data. We first denoise the data using kernel principal component analysis (KPCA) with a recently proposed Model selection criterion based on point-wise Distance Distributions (MDD) to obtain the underlying information in the data. We then use the KSC technique on this denoised data to obtain good quality clusters. One advantage of model based techniques is that we can use the same training and validation set for denoising and for clustering. We discovered that using the same kernel bandwidth parameter obtained from MDD for KPCA works efficiently with KSC in combination with the optimal number of clusters k to produce good quality clusters. We compare the proposed approach with normal KSC and KSC with KPCA using a heuristic method based on reconstruction error for several synthetic and real-world datasets to showcase the effectiveness of the proposed approach.

Original languageEnglish
Title of host publication2016 International Joint Conference on Neural Networks, IJCNN 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3709-3716
Number of pages8
Volume2016-October
ISBN (Electronic)9781509006199
DOIs
Publication statusPublished - 31 Oct 2016
Event2016 International Joint Conference on Neural Networks, IJCNN 2016 - Vancouver, Canada
Duration: 24 Jul 201629 Jul 2016

Other

Other2016 International Joint Conference on Neural Networks, IJCNN 2016
CountryCanada
CityVancouver
Period24/7/1629/7/16

Fingerprint

Principal component analysis
Heuristic methods
Bandwidth

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Cite this

Mall, R., Bensmail, H., Langone, R., Varon, C., & Suykens, J. A. K. (2016). Denoised Kernel Spectral data Clustering. In 2016 International Joint Conference on Neural Networks, IJCNN 2016 (Vol. 2016-October, pp. 3709-3716). [7727677] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IJCNN.2016.7727677

Denoised Kernel Spectral data Clustering. / Mall, RaghvenPhDa; Bensmail, Halima; Langone, Rocco; Varon, Carolina; Suykens, Johan A.K.

2016 International Joint Conference on Neural Networks, IJCNN 2016. Vol. 2016-October Institute of Electrical and Electronics Engineers Inc., 2016. p. 3709-3716 7727677.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mall, R, Bensmail, H, Langone, R, Varon, C & Suykens, JAK 2016, Denoised Kernel Spectral data Clustering. in 2016 International Joint Conference on Neural Networks, IJCNN 2016. vol. 2016-October, 7727677, Institute of Electrical and Electronics Engineers Inc., pp. 3709-3716, 2016 International Joint Conference on Neural Networks, IJCNN 2016, Vancouver, Canada, 24/7/16. https://doi.org/10.1109/IJCNN.2016.7727677
Mall R, Bensmail H, Langone R, Varon C, Suykens JAK. Denoised Kernel Spectral data Clustering. In 2016 International Joint Conference on Neural Networks, IJCNN 2016. Vol. 2016-October. Institute of Electrical and Electronics Engineers Inc. 2016. p. 3709-3716. 7727677 https://doi.org/10.1109/IJCNN.2016.7727677
Mall, RaghvenPhDa ; Bensmail, Halima ; Langone, Rocco ; Varon, Carolina ; Suykens, Johan A.K. / Denoised Kernel Spectral data Clustering. 2016 International Joint Conference on Neural Networks, IJCNN 2016. Vol. 2016-October Institute of Electrical and Electronics Engineers Inc., 2016. pp. 3709-3716
@inproceedings{6b6c6022f8e04e97b0ffcd742e02fe71,
title = "Denoised Kernel Spectral data Clustering",
abstract = "Kernel Spectral Clustering (KSC) solves a weighted kernel principal component analysis problem in a primal-dual optimization framework. It builds an unsupervised model on a small subset of data using the dual solution of the optimization problem. This allows KSC to have a powerful out-of-sample extension property leading to good cluster generalization w.r.t. unseen data points. However, in the presence of noise that causes overlapping data, the technique often fails to provide good generalization capability. In this paper, we propose a two-step process for clustering noisy data. We first denoise the data using kernel principal component analysis (KPCA) with a recently proposed Model selection criterion based on point-wise Distance Distributions (MDD) to obtain the underlying information in the data. We then use the KSC technique on this denoised data to obtain good quality clusters. One advantage of model based techniques is that we can use the same training and validation set for denoising and for clustering. We discovered that using the same kernel bandwidth parameter obtained from MDD for KPCA works efficiently with KSC in combination with the optimal number of clusters k to produce good quality clusters. We compare the proposed approach with normal KSC and KSC with KPCA using a heuristic method based on reconstruction error for several synthetic and real-world datasets to showcase the effectiveness of the proposed approach.",
author = "RaghvenPhDa Mall and Halima Bensmail and Rocco Langone and Carolina Varon and Suykens, {Johan A.K.}",
year = "2016",
month = "10",
day = "31",
doi = "10.1109/IJCNN.2016.7727677",
language = "English",
volume = "2016-October",
pages = "3709--3716",
booktitle = "2016 International Joint Conference on Neural Networks, IJCNN 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Denoised Kernel Spectral data Clustering

AU - Mall, RaghvenPhDa

AU - Bensmail, Halima

AU - Langone, Rocco

AU - Varon, Carolina

AU - Suykens, Johan A.K.

PY - 2016/10/31

Y1 - 2016/10/31

N2 - Kernel Spectral Clustering (KSC) solves a weighted kernel principal component analysis problem in a primal-dual optimization framework. It builds an unsupervised model on a small subset of data using the dual solution of the optimization problem. This allows KSC to have a powerful out-of-sample extension property leading to good cluster generalization w.r.t. unseen data points. However, in the presence of noise that causes overlapping data, the technique often fails to provide good generalization capability. In this paper, we propose a two-step process for clustering noisy data. We first denoise the data using kernel principal component analysis (KPCA) with a recently proposed Model selection criterion based on point-wise Distance Distributions (MDD) to obtain the underlying information in the data. We then use the KSC technique on this denoised data to obtain good quality clusters. One advantage of model based techniques is that we can use the same training and validation set for denoising and for clustering. We discovered that using the same kernel bandwidth parameter obtained from MDD for KPCA works efficiently with KSC in combination with the optimal number of clusters k to produce good quality clusters. We compare the proposed approach with normal KSC and KSC with KPCA using a heuristic method based on reconstruction error for several synthetic and real-world datasets to showcase the effectiveness of the proposed approach.

AB - Kernel Spectral Clustering (KSC) solves a weighted kernel principal component analysis problem in a primal-dual optimization framework. It builds an unsupervised model on a small subset of data using the dual solution of the optimization problem. This allows KSC to have a powerful out-of-sample extension property leading to good cluster generalization w.r.t. unseen data points. However, in the presence of noise that causes overlapping data, the technique often fails to provide good generalization capability. In this paper, we propose a two-step process for clustering noisy data. We first denoise the data using kernel principal component analysis (KPCA) with a recently proposed Model selection criterion based on point-wise Distance Distributions (MDD) to obtain the underlying information in the data. We then use the KSC technique on this denoised data to obtain good quality clusters. One advantage of model based techniques is that we can use the same training and validation set for denoising and for clustering. We discovered that using the same kernel bandwidth parameter obtained from MDD for KPCA works efficiently with KSC in combination with the optimal number of clusters k to produce good quality clusters. We compare the proposed approach with normal KSC and KSC with KPCA using a heuristic method based on reconstruction error for several synthetic and real-world datasets to showcase the effectiveness of the proposed approach.

UR - http://www.scopus.com/inward/record.url?scp=85007202545&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85007202545&partnerID=8YFLogxK

U2 - 10.1109/IJCNN.2016.7727677

DO - 10.1109/IJCNN.2016.7727677

M3 - Conference contribution

VL - 2016-October

SP - 3709

EP - 3716

BT - 2016 International Joint Conference on Neural Networks, IJCNN 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -