Using micro-documents for feature selection: The case of ordinal text classification

Stefano Baccianella, Andrea Esuli, Fabrizio Sebastiani

Research output: Contribution to journalConference article

Abstract

Most popular feature selection methods for text classification (TC) are based on binary information concerning the presence/absence of the feature in each training document. As such, these methods do not exploit term frequency information. In order to overcome this drawback we break down each training document of length k into k training "micro- documents", each consisting of a single word occurrence and endowed with the class information of the original training document. We study the impact of this strategy in the case of ordinal TC; the experiments show that this strategy substantially improves effectiveness.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume704
Publication statusPublished - 1 Dec 2011
Event2nd Italian Information Retrieval Workshop, IIR 2011 - Milan, Italy
Duration: 27 Jan 201128 Jan 2011

    Fingerprint

ASJC Scopus subject areas

  • Computer Science(all)

Cite this