A robust model for intelligent text classification

Roberto Basili, Alessandro Moschitti

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

Methods for taking into account linguistic content into text retrieval are receiving a growing attention [16], [14]. Text categorization is an interesting area for evaluating and quantifying the impact of linguistic information. Works in text retrieval through Internet suggest that embedding linguistic information at a suitable level within traditional quantitative approaches (e.g. sense distinctions for query expansion as in [14]) is the crucial issue able to bring the experimental stage to operational results. This kind of representational problem is also studied in this paper where traditional methods for statistical text categorization are augmented via a systematic use of linguistic information. Again, as in [14], the addition of NLP capabilities also suggested a different application of existing methods in revised forms. This paper presents an extension of the Rocchio formula [11] as a feature weighting and selection model used as a basis for multilingual Information Extraction. It allows an effective exploitation of the available linguistic information that better emphasizes this latter with significant both data compression and accuracy. The results is an original statistical classifier fed with linguistic (i.e. more complex) features and characterized by the novel feature selection and weighting model. It outperforms existing systems by keeping most of their interesting properties (i.e. easy implementation, low complexity and high scalability). Extensive tests of the model suggest its application as a viable and robust tool for large scale text classification and filtering, as well as a basic module for more complex scenarios.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Tools with Artificial Intelligence
Pages265-272
Number of pages8
Publication statusPublished - 2001
Externally publishedYes
Event13th International Conference on Tools with Artificial Intelligence - Dallas, TX, United States
Duration: 7 Nov 20019 Nov 2001

Other

Other13th International Conference on Tools with Artificial Intelligence
CountryUnited States
CityDallas, TX
Period7/11/019/11/01

Fingerprint

Linguistics
Data compression
Scalability
Feature extraction
Classifiers
Internet

ASJC Scopus subject areas

  • Software

Cite this

Basili, R., & Moschitti, A. (2001). A robust model for intelligent text classification. In Proceedings of the International Conference on Tools with Artificial Intelligence (pp. 265-272)

A robust model for intelligent text classification. / Basili, Roberto; Moschitti, Alessandro.

Proceedings of the International Conference on Tools with Artificial Intelligence. 2001. p. 265-272.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Basili, R & Moschitti, A 2001, A robust model for intelligent text classification. in Proceedings of the International Conference on Tools with Artificial Intelligence. pp. 265-272, 13th International Conference on Tools with Artificial Intelligence, Dallas, TX, United States, 7/11/01.
Basili R, Moschitti A. A robust model for intelligent text classification. In Proceedings of the International Conference on Tools with Artificial Intelligence. 2001. p. 265-272
Basili, Roberto ; Moschitti, Alessandro. / A robust model for intelligent text classification. Proceedings of the International Conference on Tools with Artificial Intelligence. 2001. pp. 265-272
@inproceedings{ad7b4caae8c04bcab785fe17f58d72fa,
title = "A robust model for intelligent text classification",
abstract = "Methods for taking into account linguistic content into text retrieval are receiving a growing attention [16], [14]. Text categorization is an interesting area for evaluating and quantifying the impact of linguistic information. Works in text retrieval through Internet suggest that embedding linguistic information at a suitable level within traditional quantitative approaches (e.g. sense distinctions for query expansion as in [14]) is the crucial issue able to bring the experimental stage to operational results. This kind of representational problem is also studied in this paper where traditional methods for statistical text categorization are augmented via a systematic use of linguistic information. Again, as in [14], the addition of NLP capabilities also suggested a different application of existing methods in revised forms. This paper presents an extension of the Rocchio formula [11] as a feature weighting and selection model used as a basis for multilingual Information Extraction. It allows an effective exploitation of the available linguistic information that better emphasizes this latter with significant both data compression and accuracy. The results is an original statistical classifier fed with linguistic (i.e. more complex) features and characterized by the novel feature selection and weighting model. It outperforms existing systems by keeping most of their interesting properties (i.e. easy implementation, low complexity and high scalability). Extensive tests of the model suggest its application as a viable and robust tool for large scale text classification and filtering, as well as a basic module for more complex scenarios.",
author = "Roberto Basili and Alessandro Moschitti",
year = "2001",
language = "English",
pages = "265--272",
booktitle = "Proceedings of the International Conference on Tools with Artificial Intelligence",

}

TY - GEN

T1 - A robust model for intelligent text classification

AU - Basili, Roberto

AU - Moschitti, Alessandro

PY - 2001

Y1 - 2001

N2 - Methods for taking into account linguistic content into text retrieval are receiving a growing attention [16], [14]. Text categorization is an interesting area for evaluating and quantifying the impact of linguistic information. Works in text retrieval through Internet suggest that embedding linguistic information at a suitable level within traditional quantitative approaches (e.g. sense distinctions for query expansion as in [14]) is the crucial issue able to bring the experimental stage to operational results. This kind of representational problem is also studied in this paper where traditional methods for statistical text categorization are augmented via a systematic use of linguistic information. Again, as in [14], the addition of NLP capabilities also suggested a different application of existing methods in revised forms. This paper presents an extension of the Rocchio formula [11] as a feature weighting and selection model used as a basis for multilingual Information Extraction. It allows an effective exploitation of the available linguistic information that better emphasizes this latter with significant both data compression and accuracy. The results is an original statistical classifier fed with linguistic (i.e. more complex) features and characterized by the novel feature selection and weighting model. It outperforms existing systems by keeping most of their interesting properties (i.e. easy implementation, low complexity and high scalability). Extensive tests of the model suggest its application as a viable and robust tool for large scale text classification and filtering, as well as a basic module for more complex scenarios.

AB - Methods for taking into account linguistic content into text retrieval are receiving a growing attention [16], [14]. Text categorization is an interesting area for evaluating and quantifying the impact of linguistic information. Works in text retrieval through Internet suggest that embedding linguistic information at a suitable level within traditional quantitative approaches (e.g. sense distinctions for query expansion as in [14]) is the crucial issue able to bring the experimental stage to operational results. This kind of representational problem is also studied in this paper where traditional methods for statistical text categorization are augmented via a systematic use of linguistic information. Again, as in [14], the addition of NLP capabilities also suggested a different application of existing methods in revised forms. This paper presents an extension of the Rocchio formula [11] as a feature weighting and selection model used as a basis for multilingual Information Extraction. It allows an effective exploitation of the available linguistic information that better emphasizes this latter with significant both data compression and accuracy. The results is an original statistical classifier fed with linguistic (i.e. more complex) features and characterized by the novel feature selection and weighting model. It outperforms existing systems by keeping most of their interesting properties (i.e. easy implementation, low complexity and high scalability). Extensive tests of the model suggest its application as a viable and robust tool for large scale text classification and filtering, as well as a basic module for more complex scenarios.

UR - http://www.scopus.com/inward/record.url?scp=0035556607&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0035556607&partnerID=8YFLogxK

M3 - Conference contribution

SP - 265

EP - 272

BT - Proceedings of the International Conference on Tools with Artificial Intelligence

ER -