Abstract
Feature selection and weighting are the primary activity of every learning algorithm for text classification. Traditionally these tasks are carried out individually in two distinct phases: the first is the global feature selection during a corpus pre-processing and the second is the application of the feature weighting model. This means that two (or several) different techniques are used to optimize the performances even if a single algorithm may have more chances to operate the right choices. When the complete feature set is available, the classifier learning algorithm can better relate to the suitable representation level the different complex features like linguistic ones (e. g. syntactic categories associated to words in the training materialor terminological expressions). In [3] it has been suggested that classifiers based on generalized Rocchio formula can be used to weight features in category profiles in order to exploit the selectivity of linguistic information techniques in text classification. In this paper, a systematic study aimed to understand the role of Rocchio formula in selection and weighting of linguistic features will be described.
Original language | English |
---|---|
Title of host publication | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Publisher | Springer Verlag |
Pages | 320-326 |
Number of pages | 7 |
Volume | 2175 |
ISBN (Print) | 3540426019, 9783540426011 |
Publication status | Published - 2001 |
Externally published | Yes |
Event | 7th Congress of the Italian Association for Artificial Intelligence, AIIA 2001 - Bari, Italy Duration: 25 Sep 2001 → 28 Sep 2001 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 2175 |
ISSN (Print) | 03029743 |
ISSN (Electronic) | 16113349 |
Other
Other | 7th Congress of the Italian Association for Artificial Intelligence, AIIA 2001 |
---|---|
Country | Italy |
City | Bari |
Period | 25/9/01 → 28/9/01 |
Fingerprint
ASJC Scopus subject areas
- Computer Science(all)
- Theoretical Computer Science
Cite this
A hybrid approach to optimize feature selection process in text classification. / Basili, Roberto; Moschitti, Alessandro; Pazienza, Maria Teresa.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2175 Springer Verlag, 2001. p. 320-326 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2175).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - A hybrid approach to optimize feature selection process in text classification
AU - Basili, Roberto
AU - Moschitti, Alessandro
AU - Pazienza, Maria Teresa
PY - 2001
Y1 - 2001
N2 - Feature selection and weighting are the primary activity of every learning algorithm for text classification. Traditionally these tasks are carried out individually in two distinct phases: the first is the global feature selection during a corpus pre-processing and the second is the application of the feature weighting model. This means that two (or several) different techniques are used to optimize the performances even if a single algorithm may have more chances to operate the right choices. When the complete feature set is available, the classifier learning algorithm can better relate to the suitable representation level the different complex features like linguistic ones (e. g. syntactic categories associated to words in the training materialor terminological expressions). In [3] it has been suggested that classifiers based on generalized Rocchio formula can be used to weight features in category profiles in order to exploit the selectivity of linguistic information techniques in text classification. In this paper, a systematic study aimed to understand the role of Rocchio formula in selection and weighting of linguistic features will be described.
AB - Feature selection and weighting are the primary activity of every learning algorithm for text classification. Traditionally these tasks are carried out individually in two distinct phases: the first is the global feature selection during a corpus pre-processing and the second is the application of the feature weighting model. This means that two (or several) different techniques are used to optimize the performances even if a single algorithm may have more chances to operate the right choices. When the complete feature set is available, the classifier learning algorithm can better relate to the suitable representation level the different complex features like linguistic ones (e. g. syntactic categories associated to words in the training materialor terminological expressions). In [3] it has been suggested that classifiers based on generalized Rocchio formula can be used to weight features in category profiles in order to exploit the selectivity of linguistic information techniques in text classification. In this paper, a systematic study aimed to understand the role of Rocchio formula in selection and weighting of linguistic features will be described.
UR - http://www.scopus.com/inward/record.url?scp=84949983181&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84949983181&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84949983181
SN - 3540426019
SN - 9783540426011
VL - 2175
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 320
EP - 326
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PB - Springer Verlag
ER -