Efficient incremental phrase-based document clustering

Ahmad M. Bakr, Noha Yousri, Mohamed A. Ismail

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Document clustering has become inevitable for applications that aim to extract information from huge corpuses. Such applications face two main challenges; one is the efficient representation of the documents, along with using an efficient similarity measure, and the second is dealing with the dynamic nature of the corpus. In this paper, an efficient document clustering model is introduced for incrementally storing and updating clusters of a dataset. A new phrase-based similarity method is developed along with the model to calculate the similarity between documents and clusters. Experimental results show that the new clustering model can achieve more accurate results than the traditional algorithms.

Original languageEnglish
Title of host publicationICPR 2012 - 21st International Conference on Pattern Recognition
Pages517-520
Number of pages4
Publication statusPublished - 1 Dec 2012
Event21st International Conference on Pattern Recognition, ICPR 2012 - Tsukuba, Japan
Duration: 11 Nov 201215 Nov 2012

Other

Other21st International Conference on Pattern Recognition, ICPR 2012
CountryJapan
CityTsukuba
Period11/11/1215/11/12

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Cite this

Bakr, A. M., Yousri, N., & Ismail, M. A. (2012). Efficient incremental phrase-based document clustering. In ICPR 2012 - 21st International Conference on Pattern Recognition (pp. 517-520). [6460185]

Efficient incremental phrase-based document clustering. / Bakr, Ahmad M.; Yousri, Noha; Ismail, Mohamed A.

ICPR 2012 - 21st International Conference on Pattern Recognition. 2012. p. 517-520 6460185.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bakr, AM, Yousri, N & Ismail, MA 2012, Efficient incremental phrase-based document clustering. in ICPR 2012 - 21st International Conference on Pattern Recognition., 6460185, pp. 517-520, 21st International Conference on Pattern Recognition, ICPR 2012, Tsukuba, Japan, 11/11/12.
Bakr AM, Yousri N, Ismail MA. Efficient incremental phrase-based document clustering. In ICPR 2012 - 21st International Conference on Pattern Recognition. 2012. p. 517-520. 6460185
Bakr, Ahmad M. ; Yousri, Noha ; Ismail, Mohamed A. / Efficient incremental phrase-based document clustering. ICPR 2012 - 21st International Conference on Pattern Recognition. 2012. pp. 517-520
@inproceedings{324189a9a9774f8782f1d326eebeb447,
title = "Efficient incremental phrase-based document clustering",
abstract = "Document clustering has become inevitable for applications that aim to extract information from huge corpuses. Such applications face two main challenges; one is the efficient representation of the documents, along with using an efficient similarity measure, and the second is dealing with the dynamic nature of the corpus. In this paper, an efficient document clustering model is introduced for incrementally storing and updating clusters of a dataset. A new phrase-based similarity method is developed along with the model to calculate the similarity between documents and clusters. Experimental results show that the new clustering model can achieve more accurate results than the traditional algorithms.",
author = "Bakr, {Ahmad M.} and Noha Yousri and Ismail, {Mohamed A.}",
year = "2012",
month = "12",
day = "1",
language = "English",
isbn = "9784990644109",
pages = "517--520",
booktitle = "ICPR 2012 - 21st International Conference on Pattern Recognition",

}

TY - GEN

T1 - Efficient incremental phrase-based document clustering

AU - Bakr, Ahmad M.

AU - Yousri, Noha

AU - Ismail, Mohamed A.

PY - 2012/12/1

Y1 - 2012/12/1

N2 - Document clustering has become inevitable for applications that aim to extract information from huge corpuses. Such applications face two main challenges; one is the efficient representation of the documents, along with using an efficient similarity measure, and the second is dealing with the dynamic nature of the corpus. In this paper, an efficient document clustering model is introduced for incrementally storing and updating clusters of a dataset. A new phrase-based similarity method is developed along with the model to calculate the similarity between documents and clusters. Experimental results show that the new clustering model can achieve more accurate results than the traditional algorithms.

AB - Document clustering has become inevitable for applications that aim to extract information from huge corpuses. Such applications face two main challenges; one is the efficient representation of the documents, along with using an efficient similarity measure, and the second is dealing with the dynamic nature of the corpus. In this paper, an efficient document clustering model is introduced for incrementally storing and updating clusters of a dataset. A new phrase-based similarity method is developed along with the model to calculate the similarity between documents and clusters. Experimental results show that the new clustering model can achieve more accurate results than the traditional algorithms.

UR - http://www.scopus.com/inward/record.url?scp=84874581717&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84874581717&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9784990644109

SP - 517

EP - 520

BT - ICPR 2012 - 21st International Conference on Pattern Recognition

ER -