Natural document clustering by clique percolation in random graphs

Wei Gao, Kam Fai Wong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Document clustering techniques mostly depend on models that impose explicit and/or implicit priori assumptions as to the number, size, disjunction characteristics of clusters, and/or the probability distribution of clustered data. As a result, the clustering effects tend to be unnatural and stray away more or less from the intrinsic grouping nature among the documents in a corpus. We propose a novel graph-theoretic technique called Clique Percolation Clustering (CPC). It models clustering as a process of enumerating adjacent maximal cliques in a random graph that unveils inherent structure of the underlying data, in which we unleash the commonly practiced constraints in order to discover natural overlapping clusters. Experiments show that CPC can outperform some typical algorithms on benchmark data sets, and shed light on natural document clustering.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages119-131
Number of pages13
Volume4182 LNCS
Publication statusPublished - 30 Nov 2006
Externally publishedYes
Event3rd Asia Information Retrieval Symposium, AIRS 2006 - Singapore, Singapore
Duration: 16 Oct 200618 Oct 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4182 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other3rd Asia Information Retrieval Symposium, AIRS 2006
CountrySingapore
CitySingapore
Period16/10/0618/10/06

    Fingerprint

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Gao, W., & Wong, K. F. (2006). Natural document clustering by clique percolation in random graphs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4182 LNCS, pp. 119-131). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4182 LNCS).