A hybrid approach to optimize feature selection process in text classification

Roberto Basili, Alessandro Moschitti, Maria Teresa Pazienza

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Feature selection and weighting are the primary activity of every learning algorithm for text classification. Traditionally these tasks are carried out individually in two distinct phases: the first is the global feature selection during a corpus pre-processing and the second is the application of the feature weighting model. This means that two (or several) different techniques are used to optimize the performances even if a single algorithm may have more chances to operate the right choices. When the complete feature set is available, the classifier learning algorithm can better relate to the suitable representation level the different complex features like linguistic ones (e. g. syntactic categories associated to words in the training materialor terminological expressions). In [3] it has been suggested that classifiers based on generalized Rocchio formula can be used to weight features in category profiles in order to exploit the selectivity of linguistic information techniques in text classification. In this paper, a systematic study aimed to understand the role of Rocchio formula in selection and weighting of linguistic features will be described.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages320-326
Number of pages7
Volume2175
ISBN (Print)3540426019, 9783540426011
Publication statusPublished - 2001
Externally publishedYes
Event7th Congress of the Italian Association for Artificial Intelligence, AIIA 2001 - Bari, Italy
Duration: 25 Sep 200128 Sep 2001

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2175
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other7th Congress of the Italian Association for Artificial Intelligence, AIIA 2001
CountryItaly
CityBari
Period25/9/0128/9/01

Fingerprint

Text Classification
Hybrid Approach
Linguistics
Feature Selection
Feature Weighting
Feature extraction
Optimise
Learning algorithms
Learning Algorithm
Classifiers
Classifier
Syntactics
Selectivity
Weighting
Preprocessing
Distinct
Processing
Model

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Basili, R., Moschitti, A., & Pazienza, M. T. (2001). A hybrid approach to optimize feature selection process in text classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2175, pp. 320-326). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2175). Springer Verlag.

A hybrid approach to optimize feature selection process in text classification. / Basili, Roberto; Moschitti, Alessandro; Pazienza, Maria Teresa.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2175 Springer Verlag, 2001. p. 320-326 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2175).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Basili, R, Moschitti, A & Pazienza, MT 2001, A hybrid approach to optimize feature selection process in text classification. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 2175, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2175, Springer Verlag, pp. 320-326, 7th Congress of the Italian Association for Artificial Intelligence, AIIA 2001, Bari, Italy, 25/9/01.
Basili R, Moschitti A, Pazienza MT. A hybrid approach to optimize feature selection process in text classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2175. Springer Verlag. 2001. p. 320-326. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Basili, Roberto ; Moschitti, Alessandro ; Pazienza, Maria Teresa. / A hybrid approach to optimize feature selection process in text classification. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2175 Springer Verlag, 2001. pp. 320-326 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{635fe23eabbc4980a96193a6e97e3198,
title = "A hybrid approach to optimize feature selection process in text classification",
abstract = "Feature selection and weighting are the primary activity of every learning algorithm for text classification. Traditionally these tasks are carried out individually in two distinct phases: the first is the global feature selection during a corpus pre-processing and the second is the application of the feature weighting model. This means that two (or several) different techniques are used to optimize the performances even if a single algorithm may have more chances to operate the right choices. When the complete feature set is available, the classifier learning algorithm can better relate to the suitable representation level the different complex features like linguistic ones (e. g. syntactic categories associated to words in the training materialor terminological expressions). In [3] it has been suggested that classifiers based on generalized Rocchio formula can be used to weight features in category profiles in order to exploit the selectivity of linguistic information techniques in text classification. In this paper, a systematic study aimed to understand the role of Rocchio formula in selection and weighting of linguistic features will be described.",
author = "Roberto Basili and Alessandro Moschitti and Pazienza, {Maria Teresa}",
year = "2001",
language = "English",
isbn = "3540426019",
volume = "2175",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "320--326",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - A hybrid approach to optimize feature selection process in text classification

AU - Basili, Roberto

AU - Moschitti, Alessandro

AU - Pazienza, Maria Teresa

PY - 2001

Y1 - 2001

N2 - Feature selection and weighting are the primary activity of every learning algorithm for text classification. Traditionally these tasks are carried out individually in two distinct phases: the first is the global feature selection during a corpus pre-processing and the second is the application of the feature weighting model. This means that two (or several) different techniques are used to optimize the performances even if a single algorithm may have more chances to operate the right choices. When the complete feature set is available, the classifier learning algorithm can better relate to the suitable representation level the different complex features like linguistic ones (e. g. syntactic categories associated to words in the training materialor terminological expressions). In [3] it has been suggested that classifiers based on generalized Rocchio formula can be used to weight features in category profiles in order to exploit the selectivity of linguistic information techniques in text classification. In this paper, a systematic study aimed to understand the role of Rocchio formula in selection and weighting of linguistic features will be described.

AB - Feature selection and weighting are the primary activity of every learning algorithm for text classification. Traditionally these tasks are carried out individually in two distinct phases: the first is the global feature selection during a corpus pre-processing and the second is the application of the feature weighting model. This means that two (or several) different techniques are used to optimize the performances even if a single algorithm may have more chances to operate the right choices. When the complete feature set is available, the classifier learning algorithm can better relate to the suitable representation level the different complex features like linguistic ones (e. g. syntactic categories associated to words in the training materialor terminological expressions). In [3] it has been suggested that classifiers based on generalized Rocchio formula can be used to weight features in category profiles in order to exploit the selectivity of linguistic information techniques in text classification. In this paper, a systematic study aimed to understand the role of Rocchio formula in selection and weighting of linguistic features will be described.

UR - http://www.scopus.com/inward/record.url?scp=84949983181&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949983181&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84949983181

SN - 3540426019

SN - 9783540426011

VL - 2175

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 320

EP - 326

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -