Multi-evidence, multi-criteria, lazy associative document classification

Adriano Veloso, Wagner Meira, Marco Cristo, Marcos Gonçalves, Mohammed Zaki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Citations (Scopus)

Abstract

We present a novel approach for classifying documents that combines different pieces of evidence (e.g., textual features of documents, links, and citations) transparently, through a data mining technique which generates rules associating these pieces of evidence to predefined classes. These rules can contain any number and mixture of the available evidence and are associated with several quality criteria which can be used in conjunction to choose the "best" rule to be applied at classification time. Our method is able to perform evidence enhancement by link forwarding/backwarding (i.e., navigating among documents related through citation), so that new pieces of link-based evidence are derived when necessary. Furthermore, instead of inducing a single model (or rule set) that is good on average for all predictions, the proposed approach employs a lazy method which delays the inductive process until a document is given for classification, therefore taking advantage of better qualitative evidence coming from the document. We conducted a systematic evaluation of the proposed approach using documents from the ACM Digital Library and from a Brazilian Web directory. Our approach was able to outperform in both collections all classifiers based on the best available evidence in isolation as well as state-of-the-art multi-evidence classifiers. We also evaluated our approach using the standard WebKB collection, where our approach showed gains of 1% in accuracy, being 25 times faster. Further, our approach is extremely efficient in terms of computational performance, showing gains of more than one order of magnitude when compared against other multi-evidence classifiers.

Original languageEnglish
Title of host publicationInternational Conference on Information and Knowledge Management, Proceedings
Pages218-227
Number of pages10
DOIs
Publication statusPublished - 1 Dec 2006
Externally publishedYes
Event15th ACM Conference on Information and Knowledge Management, CIKM 2006 - Arlington, VA, United States
Duration: 6 Nov 200611 Nov 2006

Other

Other15th ACM Conference on Information and Knowledge Management, CIKM 2006
CountryUnited States
CityArlington, VA
Period6/11/0611/11/06

Fingerprint

Multi-criteria
Document classification
Classifier
Citations
Data mining
Evaluation
Enhancement
Digital libraries
Prediction
World Wide Web
Isolation

Keywords

  • Classification
  • Data mining
  • Lazy algorithms

ASJC Scopus subject areas

  • Business, Management and Accounting(all)

Cite this

Veloso, A., Meira, W., Cristo, M., Gonçalves, M., & Zaki, M. (2006). Multi-evidence, multi-criteria, lazy associative document classification. In International Conference on Information and Knowledge Management, Proceedings (pp. 218-227) https://doi.org/10.1145/1183614.1183649

Multi-evidence, multi-criteria, lazy associative document classification. / Veloso, Adriano; Meira, Wagner; Cristo, Marco; Gonçalves, Marcos; Zaki, Mohammed.

International Conference on Information and Knowledge Management, Proceedings. 2006. p. 218-227.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Veloso, A, Meira, W, Cristo, M, Gonçalves, M & Zaki, M 2006, Multi-evidence, multi-criteria, lazy associative document classification. in International Conference on Information and Knowledge Management, Proceedings. pp. 218-227, 15th ACM Conference on Information and Knowledge Management, CIKM 2006, Arlington, VA, United States, 6/11/06. https://doi.org/10.1145/1183614.1183649
Veloso A, Meira W, Cristo M, Gonçalves M, Zaki M. Multi-evidence, multi-criteria, lazy associative document classification. In International Conference on Information and Knowledge Management, Proceedings. 2006. p. 218-227 https://doi.org/10.1145/1183614.1183649
Veloso, Adriano ; Meira, Wagner ; Cristo, Marco ; Gonçalves, Marcos ; Zaki, Mohammed. / Multi-evidence, multi-criteria, lazy associative document classification. International Conference on Information and Knowledge Management, Proceedings. 2006. pp. 218-227
@inproceedings{6b9fb64c0d15454bbcb263fec17e2670,
title = "Multi-evidence, multi-criteria, lazy associative document classification",
abstract = "We present a novel approach for classifying documents that combines different pieces of evidence (e.g., textual features of documents, links, and citations) transparently, through a data mining technique which generates rules associating these pieces of evidence to predefined classes. These rules can contain any number and mixture of the available evidence and are associated with several quality criteria which can be used in conjunction to choose the {"}best{"} rule to be applied at classification time. Our method is able to perform evidence enhancement by link forwarding/backwarding (i.e., navigating among documents related through citation), so that new pieces of link-based evidence are derived when necessary. Furthermore, instead of inducing a single model (or rule set) that is good on average for all predictions, the proposed approach employs a lazy method which delays the inductive process until a document is given for classification, therefore taking advantage of better qualitative evidence coming from the document. We conducted a systematic evaluation of the proposed approach using documents from the ACM Digital Library and from a Brazilian Web directory. Our approach was able to outperform in both collections all classifiers based on the best available evidence in isolation as well as state-of-the-art multi-evidence classifiers. We also evaluated our approach using the standard WebKB collection, where our approach showed gains of 1{\%} in accuracy, being 25 times faster. Further, our approach is extremely efficient in terms of computational performance, showing gains of more than one order of magnitude when compared against other multi-evidence classifiers.",
keywords = "Classification, Data mining, Lazy algorithms",
author = "Adriano Veloso and Wagner Meira and Marco Cristo and Marcos Gon{\cc}alves and Mohammed Zaki",
year = "2006",
month = "12",
day = "1",
doi = "10.1145/1183614.1183649",
language = "English",
isbn = "1595934332",
pages = "218--227",
booktitle = "International Conference on Information and Knowledge Management, Proceedings",

}

TY - GEN

T1 - Multi-evidence, multi-criteria, lazy associative document classification

AU - Veloso, Adriano

AU - Meira, Wagner

AU - Cristo, Marco

AU - Gonçalves, Marcos

AU - Zaki, Mohammed

PY - 2006/12/1

Y1 - 2006/12/1

N2 - We present a novel approach for classifying documents that combines different pieces of evidence (e.g., textual features of documents, links, and citations) transparently, through a data mining technique which generates rules associating these pieces of evidence to predefined classes. These rules can contain any number and mixture of the available evidence and are associated with several quality criteria which can be used in conjunction to choose the "best" rule to be applied at classification time. Our method is able to perform evidence enhancement by link forwarding/backwarding (i.e., navigating among documents related through citation), so that new pieces of link-based evidence are derived when necessary. Furthermore, instead of inducing a single model (or rule set) that is good on average for all predictions, the proposed approach employs a lazy method which delays the inductive process until a document is given for classification, therefore taking advantage of better qualitative evidence coming from the document. We conducted a systematic evaluation of the proposed approach using documents from the ACM Digital Library and from a Brazilian Web directory. Our approach was able to outperform in both collections all classifiers based on the best available evidence in isolation as well as state-of-the-art multi-evidence classifiers. We also evaluated our approach using the standard WebKB collection, where our approach showed gains of 1% in accuracy, being 25 times faster. Further, our approach is extremely efficient in terms of computational performance, showing gains of more than one order of magnitude when compared against other multi-evidence classifiers.

AB - We present a novel approach for classifying documents that combines different pieces of evidence (e.g., textual features of documents, links, and citations) transparently, through a data mining technique which generates rules associating these pieces of evidence to predefined classes. These rules can contain any number and mixture of the available evidence and are associated with several quality criteria which can be used in conjunction to choose the "best" rule to be applied at classification time. Our method is able to perform evidence enhancement by link forwarding/backwarding (i.e., navigating among documents related through citation), so that new pieces of link-based evidence are derived when necessary. Furthermore, instead of inducing a single model (or rule set) that is good on average for all predictions, the proposed approach employs a lazy method which delays the inductive process until a document is given for classification, therefore taking advantage of better qualitative evidence coming from the document. We conducted a systematic evaluation of the proposed approach using documents from the ACM Digital Library and from a Brazilian Web directory. Our approach was able to outperform in both collections all classifiers based on the best available evidence in isolation as well as state-of-the-art multi-evidence classifiers. We also evaluated our approach using the standard WebKB collection, where our approach showed gains of 1% in accuracy, being 25 times faster. Further, our approach is extremely efficient in terms of computational performance, showing gains of more than one order of magnitude when compared against other multi-evidence classifiers.

KW - Classification

KW - Data mining

KW - Lazy algorithms

UR - http://www.scopus.com/inward/record.url?scp=34547618445&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34547618445&partnerID=8YFLogxK

U2 - 10.1145/1183614.1183649

DO - 10.1145/1183614.1183649

M3 - Conference contribution

SN - 1595934332

SN - 9781595934338

SP - 218

EP - 227

BT - International Conference on Information and Knowledge Management, Proceedings

ER -