κ-means-: A unified approach to clustering and outlier detection

Sanjay Chawla, Aristides Gionisy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a unified approach for simultaneously clustering and discovering outliers in data. Our approach is formalized as a generalization of the k-means problem. We prove that the problem is NP-hard and then present a practical polynomial time algorithm, which is guaranteed to converge to a local optimum. Furthermore we extend our approach to all distance measures that can be expressed in the form of a Bregman divergence. Experiments on synthetic and real datasets demonstrate the effectiveness of our approach and the utility of carrying out both clustering and outlier detection in a concurrent manner. In particular on the famous KDD cup network-intrusion dataset, we were able to increase the precision of the outlier detection task by nearly 100% compared to the classical nearest-neighbor approach.

Original languageEnglish
Title of host publicationProceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013
PublisherSiam Society
Pages189-197
Number of pages9
ISBN (Print)9781611972627
Publication statusPublished - 2013
Externally publishedYes
EventSIAM International Conference on Data Mining, SDM 2013 - Austin, United States
Duration: 2 May 20134 May 2013

Other

OtherSIAM International Conference on Data Mining, SDM 2013
CountryUnited States
CityAustin
Period2/5/134/5/13

    Fingerprint

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Cite this

Chawla, S., & Gionisy, A. (2013). κ-means-: A unified approach to clustering and outlier detection. In Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013 (pp. 189-197). Siam Society.