A robust model for intelligent text classification

Roberto Basili, Alessandro Moschitti

Research output: Contribution to journalConference article

10 Citations (Scopus)

Abstract

Methods for taking into account linguistic content into text retrieval are receiving a growing attention [16], [14]. Text categorization is an interesting area for evaluating and quantifying the impact of linguistic information. Works in text retrieval through Internet suggest that embedding linguistic information at a suitable level within traditional quantitative approaches (e.g. sense distinctions for query expansion as in [14]) is the crucial issue able to bring the experimental stage to operational results. This kind of representational problem is also studied in this paper where traditional methods for statistical text categorization are augmented via a systematic use of linguistic information. Again, as in [14], the addition of NLP capabilities also suggested a different application of existing methods in revised forms. This paper presents an extension of the Rocchio formula [11] as a feature weighting and selection model used as a basis for multilingual Information Extraction. It allows an effective exploitation of the available linguistic information that better emphasizes this latter with significant both data compression and accuracy. The results is an original statistical classifier fed with linguistic (i.e. more complex) features and characterized by the novel feature selection and weighting model. It outperforms existing systems by keeping most of their interesting properties (i.e. easy implementation, low complexity and high scalability). Extensive tests of the model suggest its application as a viable and robust tool for large scale text classification and filtering, as well as a basic module for more complex scenarios.

Original languageEnglish
Pages (from-to)265-272
Number of pages8
JournalProceedings of the International Conference on Tools with Artificial Intelligence
Publication statusPublished - 1 Dec 2001
Event13th International Conference on Tools with Artificial Intelligence - Dallas, TX, United States
Duration: 7 Nov 20019 Nov 2001

    Fingerprint

ASJC Scopus subject areas

  • Software

Cite this