Most popular feature selection methods for text classification (TC) are based on binary information concerning the presence/absence of the feature in each training document. As such, these methods do not exploit term frequency information. In order to overcome this drawback we break down each training document of length k into k training "micro- documents", each consisting of a single word occurrence and endowed with the class information of the original training document. We study the impact of this strategy in the case of ordinal TC; the experiments show that this strategy substantially improves effectiveness.
|Journal||CEUR Workshop Proceedings|
|Publication status||Published - 1 Dec 2011|
|Event||2nd Italian Information Retrieval Workshop, IIR 2011 - Milan, Italy|
Duration: 27 Jan 2011 → 28 Jan 2011
ASJC Scopus subject areas
- Computer Science(all)