κ-means-

A unified approach to clustering and outlier detection

Sanjay Chawla, Aristides Gionisy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a unified approach for simultaneously clustering and discovering outliers in data. Our approach is formalized as a generalization of the k-means problem. We prove that the problem is NP-hard and then present a practical polynomial time algorithm, which is guaranteed to converge to a local optimum. Furthermore we extend our approach to all distance measures that can be expressed in the form of a Bregman divergence. Experiments on synthetic and real datasets demonstrate the effectiveness of our approach and the utility of carrying out both clustering and outlier detection in a concurrent manner. In particular on the famous KDD cup network-intrusion dataset, we were able to increase the precision of the outlier detection task by nearly 100% compared to the classical nearest-neighbor approach.

Original languageEnglish
Title of host publicationProceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013
PublisherSiam Society
Pages189-197
Number of pages9
ISBN (Print)9781611972627
Publication statusPublished - 2013
Externally publishedYes
EventSIAM International Conference on Data Mining, SDM 2013 - Austin, United States
Duration: 2 May 20134 May 2013

Other

OtherSIAM International Conference on Data Mining, SDM 2013
CountryUnited States
CityAustin
Period2/5/134/5/13

Fingerprint

Computational complexity
Polynomials
Experiments

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Cite this

Chawla, S., & Gionisy, A. (2013). κ-means-: A unified approach to clustering and outlier detection. In Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013 (pp. 189-197). Siam Society.

κ-means- : A unified approach to clustering and outlier detection. / Chawla, Sanjay; Gionisy, Aristides.

Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013. Siam Society, 2013. p. 189-197.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chawla, S & Gionisy, A 2013, κ-means-: A unified approach to clustering and outlier detection. in Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013. Siam Society, pp. 189-197, SIAM International Conference on Data Mining, SDM 2013, Austin, United States, 2/5/13.
Chawla S, Gionisy A. κ-means-: A unified approach to clustering and outlier detection. In Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013. Siam Society. 2013. p. 189-197
Chawla, Sanjay ; Gionisy, Aristides. / κ-means- : A unified approach to clustering and outlier detection. Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013. Siam Society, 2013. pp. 189-197
@inproceedings{50ac65aa5d1a4ed2a680d14139d90ed1,
title = "κ-means-: A unified approach to clustering and outlier detection",
abstract = "We present a unified approach for simultaneously clustering and discovering outliers in data. Our approach is formalized as a generalization of the k-means problem. We prove that the problem is NP-hard and then present a practical polynomial time algorithm, which is guaranteed to converge to a local optimum. Furthermore we extend our approach to all distance measures that can be expressed in the form of a Bregman divergence. Experiments on synthetic and real datasets demonstrate the effectiveness of our approach and the utility of carrying out both clustering and outlier detection in a concurrent manner. In particular on the famous KDD cup network-intrusion dataset, we were able to increase the precision of the outlier detection task by nearly 100{\%} compared to the classical nearest-neighbor approach.",
author = "Sanjay Chawla and Aristides Gionisy",
year = "2013",
language = "English",
isbn = "9781611972627",
pages = "189--197",
booktitle = "Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013",
publisher = "Siam Society",

}

TY - GEN

T1 - κ-means-

T2 - A unified approach to clustering and outlier detection

AU - Chawla, Sanjay

AU - Gionisy, Aristides

PY - 2013

Y1 - 2013

N2 - We present a unified approach for simultaneously clustering and discovering outliers in data. Our approach is formalized as a generalization of the k-means problem. We prove that the problem is NP-hard and then present a practical polynomial time algorithm, which is guaranteed to converge to a local optimum. Furthermore we extend our approach to all distance measures that can be expressed in the form of a Bregman divergence. Experiments on synthetic and real datasets demonstrate the effectiveness of our approach and the utility of carrying out both clustering and outlier detection in a concurrent manner. In particular on the famous KDD cup network-intrusion dataset, we were able to increase the precision of the outlier detection task by nearly 100% compared to the classical nearest-neighbor approach.

AB - We present a unified approach for simultaneously clustering and discovering outliers in data. Our approach is formalized as a generalization of the k-means problem. We prove that the problem is NP-hard and then present a practical polynomial time algorithm, which is guaranteed to converge to a local optimum. Furthermore we extend our approach to all distance measures that can be expressed in the form of a Bregman divergence. Experiments on synthetic and real datasets demonstrate the effectiveness of our approach and the utility of carrying out both clustering and outlier detection in a concurrent manner. In particular on the famous KDD cup network-intrusion dataset, we were able to increase the precision of the outlier detection task by nearly 100% compared to the classical nearest-neighbor approach.

UR - http://www.scopus.com/inward/record.url?scp=84951002474&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84951002474&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781611972627

SP - 189

EP - 197

BT - Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013

PB - Siam Society

ER -