Using micro-documents for feature selection: The case of ordinal text classification

Stefano Baccianella, Andrea Esuli, Fabrizio Sebastiani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Most popular feature selection methods for text classification (TC) are based on binary information concerning the presence/absence of the feature in each training document. As such, these methods do not exploit term frequency information. In order to overcome this drawback we break down each training document of length k into k training "micro- documents", each consisting of a single word occurrence and endowed with the class information of the original training document. We study the impact of this strategy in the case of ordinal TC; the experiments show that this strategy substantially improves effectiveness.

Original languageEnglish
Title of host publicationCEUR Workshop Proceedings
Volume704
Publication statusPublished - 2011
Externally publishedYes
Event2nd Italian Information Retrieval Workshop, IIR 2011 - Milan, Italy
Duration: 27 Jan 201128 Jan 2011

Other

Other2nd Italian Information Retrieval Workshop, IIR 2011
CountryItaly
CityMilan
Period27/1/1128/1/11

Fingerprint

Feature extraction
Experiments

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Baccianella, S., Esuli, A., & Sebastiani, F. (2011). Using micro-documents for feature selection: The case of ordinal text classification. In CEUR Workshop Proceedings (Vol. 704)

Using micro-documents for feature selection : The case of ordinal text classification. / Baccianella, Stefano; Esuli, Andrea; Sebastiani, Fabrizio.

CEUR Workshop Proceedings. Vol. 704 2011.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Baccianella, S, Esuli, A & Sebastiani, F 2011, Using micro-documents for feature selection: The case of ordinal text classification. in CEUR Workshop Proceedings. vol. 704, 2nd Italian Information Retrieval Workshop, IIR 2011, Milan, Italy, 27/1/11.
Baccianella S, Esuli A, Sebastiani F. Using micro-documents for feature selection: The case of ordinal text classification. In CEUR Workshop Proceedings. Vol. 704. 2011
Baccianella, Stefano ; Esuli, Andrea ; Sebastiani, Fabrizio. / Using micro-documents for feature selection : The case of ordinal text classification. CEUR Workshop Proceedings. Vol. 704 2011.
@inproceedings{4ea1ce13b5504d70a9a755e8100a8566,
title = "Using micro-documents for feature selection: The case of ordinal text classification",
abstract = "Most popular feature selection methods for text classification (TC) are based on binary information concerning the presence/absence of the feature in each training document. As such, these methods do not exploit term frequency information. In order to overcome this drawback we break down each training document of length k into k training {"}micro- documents{"}, each consisting of a single word occurrence and endowed with the class information of the original training document. We study the impact of this strategy in the case of ordinal TC; the experiments show that this strategy substantially improves effectiveness.",
author = "Stefano Baccianella and Andrea Esuli and Fabrizio Sebastiani",
year = "2011",
language = "English",
volume = "704",
booktitle = "CEUR Workshop Proceedings",

}

TY - GEN

T1 - Using micro-documents for feature selection

T2 - The case of ordinal text classification

AU - Baccianella, Stefano

AU - Esuli, Andrea

AU - Sebastiani, Fabrizio

PY - 2011

Y1 - 2011

N2 - Most popular feature selection methods for text classification (TC) are based on binary information concerning the presence/absence of the feature in each training document. As such, these methods do not exploit term frequency information. In order to overcome this drawback we break down each training document of length k into k training "micro- documents", each consisting of a single word occurrence and endowed with the class information of the original training document. We study the impact of this strategy in the case of ordinal TC; the experiments show that this strategy substantially improves effectiveness.

AB - Most popular feature selection methods for text classification (TC) are based on binary information concerning the presence/absence of the feature in each training document. As such, these methods do not exploit term frequency information. In order to overcome this drawback we break down each training document of length k into k training "micro- documents", each consisting of a single word occurrence and endowed with the class information of the original training document. We study the impact of this strategy in the case of ordinal TC; the experiments show that this strategy substantially improves effectiveness.

UR - http://www.scopus.com/inward/record.url?scp=84890660565&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890660565&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84890660565

VL - 704

BT - CEUR Workshop Proceedings

ER -