κ-means-: A unified approach to clustering and outlier detection

Sanjay Chawla, Aristides Gionist

Research output: Chapter in Book/Report/Conference proceedingConference contribution

74 Citations (Scopus)

Abstract

We present a unified approach for simultaneously clustering and discovering outliers in data. Our approach is formalized as a generalization of the κ-MEANS problem. We prove that the problem is NP-hard and then present a practical polynomial time algorithm, which is guaranteed to converge to a local optimum. Furthermore we extend our approach to all distance measures that can be expressed in the form of a Bregman divergence. Experiments on synthetic and real dataseis demonstrate the effectiveness of our approach and the utility of carrying out both clustering and outlier detection in a concurrent manner. In particular on the famous KDD cup network-intrusion dataset, we were able to increase the precision of the outlier detection task by nearly 100% compared to the classical nearest-neighbor approach.

Original languageEnglish
Title of host publicationSIAM International Conference on Data Mining 2013, SMD 2013
PublisherSociety for Industrial and Applied Mathematics Publications
Pages189-197
Number of pages9
ISBN (Print)9781627487245
Publication statusPublished - 2013
Externally publishedYes
Event13th SIAM International Conference on Data Mining, SMD 2013 - Austin, United States
Duration: 2 May 20134 May 2013

Other

Other13th SIAM International Conference on Data Mining, SMD 2013
CountryUnited States
CityAustin
Period2/5/134/5/13

    Fingerprint

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Information Systems
  • Signal Processing
  • Software

Cite this

Chawla, S., & Gionist, A. (2013). κ-means-: A unified approach to clustering and outlier detection. In SIAM International Conference on Data Mining 2013, SMD 2013 (pp. 189-197). Society for Industrial and Applied Mathematics Publications.