Natural document clustering by clique percolation in random graphs

Wei Gao, Kam Fai Wong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Document clustering techniques mostly depend on models that impose explicit and/or implicit priori assumptions as to the number, size, disjunction characteristics of clusters, and/or the probability distribution of clustered data. As a result, the clustering effects tend to be unnatural and stray away more or less from the intrinsic grouping nature among the documents in a corpus. We propose a novel graph-theoretic technique called Clique Percolation Clustering (CPC). It models clustering as a process of enumerating adjacent maximal cliques in a random graph that unveils inherent structure of the underlying data, in which we unleash the commonly practiced constraints in order to discover natural overlapping clusters. Experiments show that CPC can outperform some typical algorithms on benchmark data sets, and shed light on natural document clustering.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages119-131
Number of pages13
Volume4182 LNCS
Publication statusPublished - 30 Nov 2006
Externally publishedYes
Event3rd Asia Information Retrieval Symposium, AIRS 2006 - Singapore, Singapore
Duration: 16 Oct 200618 Oct 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4182 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other3rd Asia Information Retrieval Symposium, AIRS 2006
CountrySingapore
CitySingapore
Period16/10/0618/10/06

Fingerprint

Document Clustering
Clique
Random Graphs
Cluster Analysis
Clustering
Probability distributions
Maximal Clique
Clustered Data
Grouping
Overlapping
Benchmarking
Probability Distribution
Adjacent
Experiments
Tend
Benchmark
Graph in graph theory
Model
Experiment

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Gao, W., & Wong, K. F. (2006). Natural document clustering by clique percolation in random graphs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4182 LNCS, pp. 119-131). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4182 LNCS).

Natural document clustering by clique percolation in random graphs. / Gao, Wei; Wong, Kam Fai.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4182 LNCS 2006. p. 119-131 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4182 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gao, W & Wong, KF 2006, Natural document clustering by clique percolation in random graphs. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 4182 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4182 LNCS, pp. 119-131, 3rd Asia Information Retrieval Symposium, AIRS 2006, Singapore, Singapore, 16/10/06.
Gao W, Wong KF. Natural document clustering by clique percolation in random graphs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4182 LNCS. 2006. p. 119-131. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Gao, Wei ; Wong, Kam Fai. / Natural document clustering by clique percolation in random graphs. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4182 LNCS 2006. pp. 119-131 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{ecbd2304f81a42ed935a72f3924a98b0,
title = "Natural document clustering by clique percolation in random graphs",
abstract = "Document clustering techniques mostly depend on models that impose explicit and/or implicit priori assumptions as to the number, size, disjunction characteristics of clusters, and/or the probability distribution of clustered data. As a result, the clustering effects tend to be unnatural and stray away more or less from the intrinsic grouping nature among the documents in a corpus. We propose a novel graph-theoretic technique called Clique Percolation Clustering (CPC). It models clustering as a process of enumerating adjacent maximal cliques in a random graph that unveils inherent structure of the underlying data, in which we unleash the commonly practiced constraints in order to discover natural overlapping clusters. Experiments show that CPC can outperform some typical algorithms on benchmark data sets, and shed light on natural document clustering.",
author = "Wei Gao and Wong, {Kam Fai}",
year = "2006",
month = "11",
day = "30",
language = "English",
isbn = "3540457801",
volume = "4182 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "119--131",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Natural document clustering by clique percolation in random graphs

AU - Gao, Wei

AU - Wong, Kam Fai

PY - 2006/11/30

Y1 - 2006/11/30

N2 - Document clustering techniques mostly depend on models that impose explicit and/or implicit priori assumptions as to the number, size, disjunction characteristics of clusters, and/or the probability distribution of clustered data. As a result, the clustering effects tend to be unnatural and stray away more or less from the intrinsic grouping nature among the documents in a corpus. We propose a novel graph-theoretic technique called Clique Percolation Clustering (CPC). It models clustering as a process of enumerating adjacent maximal cliques in a random graph that unveils inherent structure of the underlying data, in which we unleash the commonly practiced constraints in order to discover natural overlapping clusters. Experiments show that CPC can outperform some typical algorithms on benchmark data sets, and shed light on natural document clustering.

AB - Document clustering techniques mostly depend on models that impose explicit and/or implicit priori assumptions as to the number, size, disjunction characteristics of clusters, and/or the probability distribution of clustered data. As a result, the clustering effects tend to be unnatural and stray away more or less from the intrinsic grouping nature among the documents in a corpus. We propose a novel graph-theoretic technique called Clique Percolation Clustering (CPC). It models clustering as a process of enumerating adjacent maximal cliques in a random graph that unveils inherent structure of the underlying data, in which we unleash the commonly practiced constraints in order to discover natural overlapping clusters. Experiments show that CPC can outperform some typical algorithms on benchmark data sets, and shed light on natural document clustering.

UR - http://www.scopus.com/inward/record.url?scp=33751380750&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33751380750&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:33751380750

SN - 3540457801

SN - 9783540457800

VL - 4182 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 119

EP - 131

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -