A tensor-based clustering approach for multiple document classifications

Salvatore Romeo, Andrea Tagarelli, Francesco Gullo, Sergio Greco

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

We propose a novel approach to the problem of document clustering when multiple organizations are provided for the documents in input. Besides considering the information on the text-based content of the documents, our approach exploits frequent associations of the documents in the groups across the existing classifications, in order to capture how documents tend to be grouped together orthogonally to different views. A third-order tensor for the document collection is built over both the space of terms and the space of the discovered frequent document-associations, and then it is decomposed to finally establish a unique encompassing clustering of documents. Preliminary experiments conducted on a document clustering benchmark have shown the potential of the approach to capture the multi-view structure of existing organizations for a given collection of documents.

Original languageEnglish
Title of host publicationICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods
Pages200-205
Number of pages6
Publication statusPublished - 27 May 2013
Externally publishedYes
Event2nd International Conference on Pattern Recognition Applications and Methods, ICPRAM 2013 - Barcelona, Spain
Duration: 15 Feb 201318 Feb 2013

Other

Other2nd International Conference on Pattern Recognition Applications and Methods, ICPRAM 2013
CountrySpain
CityBarcelona
Period15/2/1318/2/13

Fingerprint

Tensors
Experiments

Keywords

  • Document clustering
  • Itemset mining
  • Tensor modeling and decomposition

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Cite this

Romeo, S., Tagarelli, A., Gullo, F., & Greco, S. (2013). A tensor-based clustering approach for multiple document classifications. In ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods (pp. 200-205)

A tensor-based clustering approach for multiple document classifications. / Romeo, Salvatore; Tagarelli, Andrea; Gullo, Francesco; Greco, Sergio.

ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods. 2013. p. 200-205.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Romeo, S, Tagarelli, A, Gullo, F & Greco, S 2013, A tensor-based clustering approach for multiple document classifications. in ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods. pp. 200-205, 2nd International Conference on Pattern Recognition Applications and Methods, ICPRAM 2013, Barcelona, Spain, 15/2/13.
Romeo S, Tagarelli A, Gullo F, Greco S. A tensor-based clustering approach for multiple document classifications. In ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods. 2013. p. 200-205
Romeo, Salvatore ; Tagarelli, Andrea ; Gullo, Francesco ; Greco, Sergio. / A tensor-based clustering approach for multiple document classifications. ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods. 2013. pp. 200-205
@inproceedings{c93cb20e117b4ba8bce3303e3c49badd,
title = "A tensor-based clustering approach for multiple document classifications",
abstract = "We propose a novel approach to the problem of document clustering when multiple organizations are provided for the documents in input. Besides considering the information on the text-based content of the documents, our approach exploits frequent associations of the documents in the groups across the existing classifications, in order to capture how documents tend to be grouped together orthogonally to different views. A third-order tensor for the document collection is built over both the space of terms and the space of the discovered frequent document-associations, and then it is decomposed to finally establish a unique encompassing clustering of documents. Preliminary experiments conducted on a document clustering benchmark have shown the potential of the approach to capture the multi-view structure of existing organizations for a given collection of documents.",
keywords = "Document clustering, Itemset mining, Tensor modeling and decomposition",
author = "Salvatore Romeo and Andrea Tagarelli and Francesco Gullo and Sergio Greco",
year = "2013",
month = "5",
day = "27",
language = "English",
isbn = "9789898565419",
pages = "200--205",
booktitle = "ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods",

}

TY - GEN

T1 - A tensor-based clustering approach for multiple document classifications

AU - Romeo, Salvatore

AU - Tagarelli, Andrea

AU - Gullo, Francesco

AU - Greco, Sergio

PY - 2013/5/27

Y1 - 2013/5/27

N2 - We propose a novel approach to the problem of document clustering when multiple organizations are provided for the documents in input. Besides considering the information on the text-based content of the documents, our approach exploits frequent associations of the documents in the groups across the existing classifications, in order to capture how documents tend to be grouped together orthogonally to different views. A third-order tensor for the document collection is built over both the space of terms and the space of the discovered frequent document-associations, and then it is decomposed to finally establish a unique encompassing clustering of documents. Preliminary experiments conducted on a document clustering benchmark have shown the potential of the approach to capture the multi-view structure of existing organizations for a given collection of documents.

AB - We propose a novel approach to the problem of document clustering when multiple organizations are provided for the documents in input. Besides considering the information on the text-based content of the documents, our approach exploits frequent associations of the documents in the groups across the existing classifications, in order to capture how documents tend to be grouped together orthogonally to different views. A third-order tensor for the document collection is built over both the space of terms and the space of the discovered frequent document-associations, and then it is decomposed to finally establish a unique encompassing clustering of documents. Preliminary experiments conducted on a document clustering benchmark have shown the potential of the approach to capture the multi-view structure of existing organizations for a given collection of documents.

KW - Document clustering

KW - Itemset mining

KW - Tensor modeling and decomposition

UR - http://www.scopus.com/inward/record.url?scp=84877980690&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84877980690&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9789898565419

SP - 200

EP - 205

BT - ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods

ER -