Multi-evidence, multi-criteria, lazy associative document classification

Adriano Veloso, Wagner Meira, Marco Cristo, Marcos Gonçalves, Mohammed Zaki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Citations (Scopus)

Abstract

We present a novel approach for classifying documents that combines different pieces of evidence (e.g., textual features of documents, links, and citations) transparently, through a data mining technique which generates rules associating these pieces of evidence to predefined classes. These rules can contain any number and mixture of the available evidence and are associated with several quality criteria which can be used in conjunction to choose the "best" rule to be applied at classification time. Our method is able to perform evidence enhancement by link forwarding/backwarding (i.e., navigating among documents related through citation), so that new pieces of link-based evidence are derived when necessary. Furthermore, instead of inducing a single model (or rule set) that is good on average for all predictions, the proposed approach employs a lazy method which delays the inductive process until a document is given for classification, therefore taking advantage of better qualitative evidence coming from the document. We conducted a systematic evaluation of the proposed approach using documents from the ACM Digital Library and from a Brazilian Web directory. Our approach was able to outperform in both collections all classifiers based on the best available evidence in isolation as well as state-of-the-art multi-evidence classifiers. We also evaluated our approach using the standard WebKB collection, where our approach showed gains of 1% in accuracy, being 25 times faster. Further, our approach is extremely efficient in terms of computational performance, showing gains of more than one order of magnitude when compared against other multi-evidence classifiers.

Original languageEnglish
Title of host publicationInternational Conference on Information and Knowledge Management, Proceedings
Pages218-227
Number of pages10
DOIs
Publication statusPublished - 1 Dec 2006
Externally publishedYes
Event15th ACM Conference on Information and Knowledge Management, CIKM 2006 - Arlington, VA, United States
Duration: 6 Nov 200611 Nov 2006

Other

Other15th ACM Conference on Information and Knowledge Management, CIKM 2006
CountryUnited States
CityArlington, VA
Period6/11/0611/11/06

    Fingerprint

Keywords

  • Classification
  • Data mining
  • Lazy algorithms

ASJC Scopus subject areas

  • Business, Management and Accounting(all)

Cite this

Veloso, A., Meira, W., Cristo, M., Gonçalves, M., & Zaki, M. (2006). Multi-evidence, multi-criteria, lazy associative document classification. In International Conference on Information and Knowledge Management, Proceedings (pp. 218-227) https://doi.org/10.1145/1183614.1183649