A hybrid approach to optimize feature selection process in text classification

Roberto Basili, Alessandro Moschitti, Maria Teresa Pazienza

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Feature selection and weighting are the primary activity of every learning algorithm for text classification. Traditionally these tasks are carried out individually in two distinct phases: the first is the global feature selection during a corpus pre-processing and the second is the application of the feature weighting model. This means that two (or several) different techniques are used to optimize the performances even if a single algorithm may have more chances to operate the right choices. When the complete feature set is available, the classifier learning algorithm can better relate to the suitable representation level the different complex features like linguistic ones (e. g. syntactic categories associated to words in the training materialor terminological expressions). In [3] it has been suggested that classifiers based on generalized Rocchio formula can be used to weight features in category profiles in order to exploit the selectivity of linguistic information techniques in text classification. In this paper, a systematic study aimed to understand the role of Rocchio formula in selection and weighting of linguistic features will be described.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages320-326
Number of pages7
Volume2175
ISBN (Print)3540426019, 9783540426011
Publication statusPublished - 2001
Externally publishedYes
Event7th Congress of the Italian Association for Artificial Intelligence, AIIA 2001 - Bari, Italy
Duration: 25 Sep 200128 Sep 2001

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2175
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other7th Congress of the Italian Association for Artificial Intelligence, AIIA 2001
CountryItaly
CityBari
Period25/9/0128/9/01

    Fingerprint

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Basili, R., Moschitti, A., & Pazienza, M. T. (2001). A hybrid approach to optimize feature selection process in text classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2175, pp. 320-326). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2175). Springer Verlag.